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NEISSERIAL ANTIGENS 



This invention relates to antigens from Neisseria bacteria. 
BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
5 are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); N. gonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
1 0 present in all pathogenic meningococci. 

N.gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.81 7-842). The disease causes significant morbidity but limited mortality. 
1 5 Vaccination against N.gonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

20 N. meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et al. (1996) Safety and Immunogenicity of a Serogroups AJC Neisseria meningitidis 
Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. NEngl J Med 337(14):970- 

25 976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N. meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 
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Based on the organism's capsular polysaccharide, 12 serogroups of ^.meningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 

5 the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortahty weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

10 immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against H. influenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et a! (1992) Development and phase I clinical testing of a conjugate 

15 vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of a(2-8)-linked W-acetyl neiiraminic acid that is also present in mammalian tissue. This results in 

20 tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the AT-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoom (1994) Current status of Meningococcal group B vaccine 

25 candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
30 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
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porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect. Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala'Aldeen & Boiriello (1996) The meningococcal transferrin-binding proteins 1 
5 and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
10 are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisseriae. 

THE INVENTION 

1 5 The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to N.meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
20 homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication of 
functional equivalence. Identity between the proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
affine gap search with parameters gap open penalty= 1 2 and gap extension penalty^ I. 

25 The invention further provides proteins comprising fragments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 
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The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg, native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

5 According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
1 0 sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a O.lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
1 5 at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
20 complementary to those described above (eg. for antisense or probing purposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term <4 nucleic acid" includes DNA and RNA, and also their analogues, such as 
25 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 
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Acconiiiig to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
5 for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
1 0 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain of N.meningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
15 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

20 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

25 A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 
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A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

5 General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 
Molecular Cloning; A Laboratory Manual. Second Edition (1989); DNA Cloning, Volumes I and 
10 ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (MJ. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & S.J. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S.J. Higgins eds. 1984); Animal Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 
Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 
1 5 Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer-Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-W (D.M. Weir and C. C. Blackwell eds 1986). 

20 Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

Definitions 

25 A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 
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The term "heterologous" refers to two biological components that are not found together in nature. 
The components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
5 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

10 unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 

15 cells. 

A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% {eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 
20 Smith-Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 
25 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Ex pression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
30 for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 
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\ MaTntnoiian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3*) 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 

5 transcription initiating region, which is usually placed proximal to the 5 1 end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase H to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 

10 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd ed.]. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
15 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 
promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

20 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 

25 tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 255:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBOJ. 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

30 [Gorman et al. (1982b) Proc Natl Acad Sci 79:6777] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 41:521]. Additionally, some enhancers are regulatable and become active only 
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in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet. 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-teiminus of the 
5 recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
10 sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 
in mammalian cells. 

1 5 Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Bimstiel et al. (1985) Cell 41:349; 
Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic RNA. In 

20 Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
Sci. 14.105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylauon signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual]. 

25 Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 
an extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as 

30 mammalian cells or bacteria Mammalian replication systems include those derived from animal 
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viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 25:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
antigen. Additional examples of mammalian replicons include those derived from bovine 
5 papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufinan et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. 
(1986) Mol Cell Biol 6:1074]. 

10 The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei. 

15 Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (eg. Hep G2), and a 
number of other cell lines. 

20 ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, 
and is operably linked to the control elements within that vector. Vector construction employs 
techniques which are known in the art Generally, the components of the expression system include 
a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus 

25 genome, and a convenient restriction site for insertion of the heterologous gene or genes to be 
expressed; a wild type baculovirus with a sequence homologous to the baculovirus-specific fragment 
in the transfer vector (this allows for the homologous recombination of the heterologous gene in to 
the baculovirus genome); and appropriate insect host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
30 wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant vims is expressed and recombinant plaques 
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are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. J555 (1987) (hereinafter "Summers 
5 and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
10 elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple 
genes, regulated by the same set of regulatory elements. Intermediate transplacement constructs are 
often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) capable of stable 
maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allowing 
it to be maintained in a suitable host for cloning and amplification. 

1 5 Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 77:31. 

20 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol., 42X11) and a prokaryotic ampicillin-resistance (amp) gene and origin of 
replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 

25 (5* to 30 transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 

30 regulated or constitutive. 
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Structuial genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
5 476; and the gene encoding the plO protein, Vlak et al., (1 988), 1 Gen. Virol 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 

10 insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
those derived from genes encoding human a-interferon, Maeda et al., (1985), Nature 315:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell Biol 5:3129; 
human IL-2, Smith et al., (1985) Proc. Nat'lAcad. Set USA, 52:8404; mouse IL-3, (Miyajima et 

15 al., (1987) Gene J5:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intraceliularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfiised 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
20 containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
25 of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
30 vector and the genomic DNA of wild type baculovirus -- usually by co-transfection. The promoter 
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and transcription termination sequence of the construct will usually comprise a 2-Skb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol. Cell. Biol. (1983) 5:2156; and Luckow and Summers (1989)). For example, the insertion 
5 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays 4:91. The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

10 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 

1 5 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 um in size, are highly refractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 
infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 

20 wild-type vims, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

25 Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa californica,Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol. 5d:153; Wright (1986) Nature 
527:718; Smith et al., (1983) Mol. Cell. Biol. 5:2156; and see generally, Fraser, et al. (1989) In 

30 Vitro Cell. Dev. Biol. 25:225). 
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Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
5 stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product 
gene is under inducible control, the host may be grown to high density, and expression induced. 
Alternatively, where expression is constitutive, the product will be continuously expressed into the 
medium and the nutrient medium must be continuously circulated, while removing the product of 
interest and augmenting depleted nutrients. The product may be purified by such techniques as 
10 chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, etc.; 
electrophoresis; density gradient centrifugation; solvent extraction, or the like. As appropriate, the 
product may be further purified, as required, so as to remove substantially any insect proteins which 
are also secreted in the medium or result from lysis of insect cells, so as to provide a product which 
is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

15 In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 

iii. Plant Systems 

20 There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
. US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 

25 Vaulcombe et aL, Mol. Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol. Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al., Molecular 
Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 

30 gibberellic acid can be found in RL. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
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References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proc. Mali Acad ScL 
84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
5 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 
1 0 desired plant host. The basic bacterial/p]ant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 
15 general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, \993, Plant Mol. Biol. Reptr> 1 1(2): 165-1 85. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
20 cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
25 although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 

5 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 

10 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
1 5 region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol. Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 

20 Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991 , Planta, 1 85:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 

25 entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. 
Natl. Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
30 reversibly penneabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 

5 other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solatium, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 

1 0 Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 

15 formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 

20 history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
25 embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
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Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 
a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation 

5 region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 

10 thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 

15 coli) [Raibaud et al. (1984) Annu. Rev. Genet. 75:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose (lac) [Chang et al. (1977) Nature 795:1056], and maltose. Additional examples include 

20 promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et al. 
(1980) Nuc. Acids Res. 5:4057; Yelverton et al. (1981) Nucl. Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) promoter system 
[Weissmann (1981) "The cloning of interferon and other mistakes." In Interferon 3 (ed. I. Grosser)], 
bacteriophage lambda PL [Shimatake et al. (1981) Nature 292:128] and T5 [US patent 4,689,406] 

25 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
30 promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al. (1983) Gene 25:167; de Boer et al. (1983) Proc. Natl. Acad. Sri. 50:21]. 
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Furthermore, abacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
5 polymerase/promoter system is an example of a coupled promoter system [Studier et al. (1986) J. 
Mol. Biol. 759:113; Tabor et al. (1985) Proc Natl. Acad. Sci. 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 
A-0 267 851). 

In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
10 the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-11 nucleotides upstream of the initiation codon [Shine et al. (1975) 
Nature 254:34]. The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al. (1979) 
15 "Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al. (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
20 with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
25 N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
30 [Nagai et al. (1984) Nature 309:810]. Fusion proteins can also be made with sequences from the 
lacL [Jia et al. (1987) Gene 60:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al. 
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(1989) J. Gen. Microbiol. J 35: 1 1 ], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing en2yme {eg. ubiquitin specific processing-protease) to cleave the 
5 ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller al (1989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
1 0 a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the 
cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic 
space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably 
there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal 
peptide fragment and the foreign gene. 

1 5 DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
such as the E. coli outer membrane protein gene (ompA) [Masui et al (1983), in: Experimental 
Manipulation of Gene Expression; Ghrayeb et al (1984) EMBO J. 3:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al (1985) Proc. Natl Acad. Sci. 52:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 

20 can be used to secrete heterologous proteins from B. subtilis [Palva et al (1982) Proc. Natl Acad. 
Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3* to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
25 encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
30 coding sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
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element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 
5 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 5 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
20 [Davies et al. (1978) Annu. Rev. Microbiol. 32:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. 
Natl. Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al. (1981) Nature 292:128; Amann et al. (1985) Gene 40:183; Studier et al. 

30 (1986) J. Mol. Biol. 759:113; EP-A-0 036 7763P-A-0 136 829 and EP-A-0 136 907], 
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Streptococcus cremoris [Powell et al. (1988) AppL Environ. Microbiol 54:655); Streptococcus 
lividans [Powell et al. (1988) AppL Environ. Microbiol 5*655], Streptomyces lividans [US patent 
4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 

5 include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
[Masson et al (1989) FEMS Microbiol Lett. 60:273; Palva et al (1982) Proa Natl Acad. ScL USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al (1988) 

10 Proc. Natl Acad ScL 55:856; Wangc/ al (1990)7. Bacteriol 772:949, Campylobacter], [Cohen 
et al (1973) Proc. Natl Acad. Sci. 69:2110; Dower et al (1988) Nucleic Acids Res. 76:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl -derived 
plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) J. Mol Biol 55:159; Taketo 

15 (1988) Biochim. Biophys. Acta 949:318; Escherichia], [Chassy et al (1987) FEMS Microbiol Lett. 
44:173 Lactobacillus]; [Fiedler et al (1988) Anal Biochem 770:38, Pseudomonas]; [Augustin et 
al (1990) FEMS Microbiol Lett. 66:203, Staphylococcus], [Barany et al (1980) J. Bacteriol 
744:698; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Feiretti and R. Curtiss III); Perry el al. (1981) Infect. Immun. 

20 52:1295; Powell et al (1988) Appl Environ. Microbiol 54:655; Somkuti et al (1987) Proc. 4th 
Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3 1 ) 

25 transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 
Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 

30 The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
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of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
5 include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucoses- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara et al. (1983) Proc. Natl. Acad. Sci. USA 50:1]. 

1 0 In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 

1 5 which consist of the regulatory sequences of either the ADH2, GAL4, GAL1 0, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 

20 77:1078; Henikoff et al. (1981) Nature 253:835; HoUenberg et al. (1981) Curr. Topics Microbiol. 
Immunol. 96:119; HoUenberg et al. (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al. (1980) Gene 
11:163; Panthier et al. (1980) Curr. Genet. 2:109;]. 

25 A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 
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Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovinis, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 

5 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5' terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 

10 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated {eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
15 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
20 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
25 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. {eg. see WO 
89/02463.) 
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Usually, transcription tennination sequences recognized by yeast are regulatory regions located 3' 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
5 tennination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 

1 0 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 5:17-24], pCl/1 [Brake et al 
(1984) Proc. Natl. Acad. Sci USA 87:4642-4646], and YRpl7 [Stinchcomb et al (1982) J. Mol 
Biol 1 58: 1 57]. In addition, a replicon may be either a high or low copy number plasmid. A high 

1 5 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 1 0 to about 1 50. A host containing a high copy number plasmid will preferably have 
at least about 1 0, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al, supra. 

20 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr-Weaver et al (1983) Methods in 

25 Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr-Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al (1983) Proc. Natl Acad. Sci. USA 50:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

30 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
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chromosoroe and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachroraosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 

5 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, 
TRPl, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUPJ allows yeast to grow in the presence of copper ions [Butt et al. (1987) Microbiol, 

10 Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
1 5 have been developed for transformation into many yeasts. For example, expression vectors have 

been developed for, inter alia, the following yeasts:Candida albicans [Kurtz, et al (1986) Mol. 

Cell. Biol. 5:142], Candida maltosa [Kunze, etal. (1985)7. Basic Microbiol. 25:141]. Hansenula 

polymorpha [Gleeson, et al. (1986) J. Gen. Microbiol. 732:3459; Roggenkamp et al. (1986) Mol. 

Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al. (1984) J. Bacteriol. 755:1165], 
20 Kluyveromyces lactis [De Louvencourt et al. (1983) J. Bacteriol. 154:111; Van den Berg et al. 

(1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al. (1985) J. Basic Microbiol. 

25:141], Pichia pastoris [Cregg, et al. (1985) Mol. Cell. Biol. 5:3376; US Patent Nos. 4,837,148 

and 4,929,555], Saccharomyces cerevisiae [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 

75:1929; Ito et al. (1983)7. Bacteriol. 755:163], Schizosaccharomyces pombe [Beach and Nurse 
25 (1981) Nature 300:106), and Yarrowia lipolytica [Davidow, et al. (1985) Curr. Genet. 70:380471 

GaiUardin, etal. (1985) Curr. Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 
Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
30 et al. (1986) Mol. Cell. Biol. 5:142; Kunze et al. (1985) J. Basic Microbiol. 25:141; Candida]; 
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[Gleeson et al (1986) J. Gen, Microbiol 732:3459; Roggenkamp et al (1986) MoL Gen. Genet 
202:302; Hansenula]; pas et al (1984) J. Bacteriol 158:1 165; De Louvencourt et al (1983) J. 
Bacteriol 154:1 165; Van den Berg et al (1990) Bio/Technology 5:135; Kluyveromyces]; [Cregg 
er a/. (1985) Mol Cell Biol 5:3376; Kunze et al (1985)/ Basic Microbiol 25:141: US Patent 
5 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al (1978) Proc. Natl Acad. ScL USA 75;1929; 
Ito et al (1983) J. Bacteriol 755:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al (1985) Cwrr. Gem* 70:39; Gaillardin et al. (1985) Curr. 
Genet. 70:49; Yarrowia]. 

10 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

15 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

20 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 |ig/injection 

25 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

30 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
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recoveredby centrifogation l.OOOgfor lOminutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 

5 above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 

10 the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT'). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 

1 5 cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P 
and l25 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 

20 are typically detected by their activity. For example, horseradish peroxidase is usually detected by its 
ability to convert 3,3',5,5'-tettametoylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and a monoclonal antibody specific 
therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, 

25 and the numerous receptor-ligand couples known in the art. It should be understood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, m I may serve as a radioactive label or as an 
electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect For example, MAbs and avidin also require labels in the practice of 

30 this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled 
with lv % or with an anu-biotin MAb labeled with HRP. Other permutations and possibilities will be 
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readily apparent to those of ordinary skill in the art, and are considered as equivalents within the scope 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
5 invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 

10 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subjects size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 

1 5 experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutical^ acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such 

20 as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. Suitable 
carriers may be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus 

25 particles. Such carriers are well known to those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
30 Pub.Co.,N.J. 1991). 
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Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol Additionally, auxiliary substances, such as wetting or emulsifying agents, 
pH buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic 
compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable 
5 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. Liposomes 
are included within the definition of a pharmaceutically acceptable carrier. 

Deliver y Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

10 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneal^, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

1 5 treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (ie to prevent infection) or 
therapeutic {ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide^), protein(s) or nucleic acid, 
20 usually in combination with "pharmaceutical^ acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
25 known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated to 
a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
30 (2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 
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such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the submit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 

5 formulated into submicron particles using a microfluidizer such as Model HOY microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 

10 cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 
adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as interleukins (eg. 

15 IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 
act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
20 threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglut^ 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypepdde/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such 
25 as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
30 enhanced adjuvant effect, as discussed above under pharmaceutically acceptable earners. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 

5 prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, etc.), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 

10 can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
15 schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 
immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Tones (1997) Seminars in Immunology 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

20 Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically. These constructs can utilize viral or non-viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 

25 mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
30 also be an astro virus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
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picornavirus, poxvirus, or togavims viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1 :51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector 
5 is employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for 
example, NZB-X1,NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) potytropic retroviruses 
eg. MCF and MCF-MLV (see Kelly (1983) J. Virol. 45:291), spumaviruses and lentiviiuses. See RNA 
Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
10 example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
15 5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
20 producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells (eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 

Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
25 Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Vims and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
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collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EPQ415731, EP0345242, EP0334301, WOS9/02468; 

5 WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 

10 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81:6349; and Miller (1990) 
Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 
See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors 

15 employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 

20 Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 

25 which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 

30 terminal repeat (ie. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
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native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J. Virol. 61 :3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 

5 Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
MuzyczkaUS Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 
further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 

10 Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
15 exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241:1667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:1 1-19 
and HSV 71 34, 2 RH 1 05 and GAM described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

20 Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 

25 W092/1 0578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995.W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 

30 08/679640). 
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DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaiyotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

5 Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J. Biol. 
Standardization 1:115; rhinovirus, for example ATCC VR-1 1 10 and those described in Arnold (1990) 
J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC 
VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; 

10 Flexner (1989) Ann NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,1 12 and US 
4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those described in 
Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza virus, for 
example ATCC VR-797 and recombinant influenza viruses made employing reverse genetics 
techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; 

15 Enami & Palese (1991)7 Virol 65:2711-2713 and Luytjes (1989) Cell 59:110, (see also McMichael 
(1983) NEJMed 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992)/. Virol. 66:2731; 
measles virus, for example ATCC VR-67 and VR-1247 and those described in EP-0440219; Aura 
virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC VR-1240; 

20 Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC 
VR-369 and ATCC VR-1243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for 
example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu 
virus, for example ATCC VR-371 ; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; 

25 Tonate virus, for example ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for 
example ATCC VR-374; Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example 
ATCC VR-375; ONyong virus, Eastern encephalitis virus, for example ATCC VR-65 and ATCC 
VR-1242; Western encephalitis virus, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 
and ATCC VR-1252; and coronavirus, for example ATCC VR-740 and those described in Hamre 

30 (1966) Proc Soc Exp Biol Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
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expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No. 08/240,030, filed May 9, 
5 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,1 52 and in WO92/11033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:241 1-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

10 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 

15 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the 
20 beads. The method may be improved further by treatment of the beads to increase hydrophobicity and 
thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 

25 vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 
DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 

30 promoters. Further non- viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. USA 
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91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photoporymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
5 activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
BiophysActa 600: 1 ; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
10 149:1 19; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. 

15 Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

20 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be aoininistered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

25 treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg. W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 
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Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
5 known in the art. 

Polynucleotide and polypeptide pharmaceutica l compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

10 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 

1 5 other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 

B. Hormones. Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 
20 C.Polvalkvlenes. Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

25 D.Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
30 but will generally be around 1 : 1 (mg DNAmicromoles lipid), or more of lipid. For a review of the 
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use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger(1983)Me*fc. Enzymol. 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
5 mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Sci. USA 
84:7413-7416); mRNA (Malone (1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990)/ Biol. Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylarnmonium 
(DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand 
10 Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 
transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 

15 Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. 
These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate 

20 ratios. Methods for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol. 101:512-527; Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
25 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl. Acad Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl. Acad. Sci. USA 76:145; Fraley (1980) J. Biol. 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl. Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215 :166. 



BMSOOCID <WO 992157BA2 l_> 



WO 99/24578 PCTAB98/01665 

-41- 

E.IJpo proteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, 
fragments, or fusions of these proteins can also be used. Also, modifications of naturally occurring 
5 lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the delivery of 
polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with 
the polynucleotide to be delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
10 identified. At least two of these contain several proteins, designated by Roman numerals, AI, AH, 
AIV; CI, CII, Cm. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
1 5 HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
261 : 1291 8; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
20 phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
naturally occurring lipoproteins can be found, for example, in Meth. Enzymol. 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
25 association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
Such methods are described in Meth, Enzymol (supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
30 Atkinson (1986) Annu RevBiophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
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443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 
Technologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/14465. 
F Pnlycationic Agents 

5 Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
10 be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyomithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be useful 
15 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
20 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

25 Immunodiaenostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 



BNSOOCID: <WO 992*57BA2J_> 



PCI7IB98/01665 

WO 99/24578 

-43- 

samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
5 or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 
molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
10 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
1 5 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in solution. 
Then, the two sequences will be placed in contact with one another under conditions that favor 
hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment of the liquid 
phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration of the sequences; 
20 use of compounds to increase the rate of association of sequences (dextran sulfate or polyethylene 
glycol); and the stringency of the washing conditions following hybridization. See Sambrook et al 
[supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
25 concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al. at page 9.50. 

30 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
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total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to ljxg for a 
plasmid or phage digest to 10* 9 to 10* 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
5 can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 \ig of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/jig. For a single-copy mammalian gene a conservative approach would start 
with 10 ng of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 
using a probe of greater than 10* cpm/ng, resulting in an exposure time of -24 hours. 

1 0 Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
the ionic strength and formamide content of the hybridization buffer. The effects of all of these 

15 factors can be approximated by a single equation: 

Tm= 81 + 16.6(log l0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/w-L5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
20 conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
25 interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for 
30 a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
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and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 
wash conditions which are nonstringent. If non-specific bands or high background are observed 
5 after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
1 0 probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
1 5 encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
20 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
25 sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
30 temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
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complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

5 Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al [Proc. Natl Acad, ScL USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
1 0 backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
al. (1993) TIBTECH 11:384-386]. 

15 Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al [Meth. Enzymol 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
that does not hybridize to the sequence of the amplification target (or its complement) to aid with 

20 duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as Southern 
25 blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
30 support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 
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to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
5 ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61 , 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 
in Western blots, the position of the main N.meningitidis immunoreactive band. TP indicates 
N.meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 
10 shows GST control data; a circle (•) shows data with recombinant N.meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPH1 analysis (lower). The AMPH1 program has been used to predict T-cell epitopes [Gao el 
al. (1989) J. Immunol. 143:3007; Roberts etal. (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al. (1992) Scand J Immunol suppl.l 1:9) and is available in the Protean package of DNASTAR, Inc. 
15 (1228 South Park Street, Madison, Wisconsin 53715 USA). 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N.meningitidis, along 
with their putative translation products, and also those oiN. gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

20 The examples are generally in the following format: 

• a nucleotide sequence which has been identified in N.meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N.meningitidis (strain A) and in 
25 N.gonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 

suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA FACS etc.) 
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The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein function to a new 
5 sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
10 following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www.genome.ou.edu/gono_blast.html. The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

1 5 Dots within nucleotide sequences {eg. position 495 in SEQ ID 1 1 ) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
underlined nucleotides were removed. Lower case letters {eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 

20 experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
domains using an algorithm based on the statistical studies of Esposti et al. [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 

25 Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 
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Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
5 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centiifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

1 5 After 1 0 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50^g/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 

20 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
25 sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5'-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoKl-Nhel, depending on the gene's own restriction pattern); the 3' primers included 
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a Xhol restriction site. This procedure was established in order to direct the cloning of each 
amplification product (corresponding to each ORF) into two different expression systems: pGEX-KG 
(using either BamHl-Xhol or EcoRl-Xhol), and pET21b+ (using either Ndel-Xhol or Nhel-Xhol). 
5*-end primer tail: CGCGGATCCCATATG (BamHl-Ndel) 
5 CGCGGATCCGCTAGC (BamHl-Nhel) 

CCGGAATTCTAGCTAGC (EcoRl-Nhel) 

3'-end primer tail: CCCG CTCGAG (^ oI ) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
1 0 the same 3' Xhol primer was used as before: 

5'-end primer tail: GGAATTCCATATGGCCATGG (Aftfel) 

5'-end primer tail: CGGGATCC (BamHl) 
ORF 76 was cloned in the pTRC expression vector and expressed as an ammo-terminus His-tag 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
1 5 BamHl restriction sites were incorporated using primers: 

5'-end primer tail: GATCAGCTAGCCATATG (Nhel) 
3'-end primer tail: CG GGATCC (BamHl) 
As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hyhridizeed to the sequence to be amplified. The number of hybridizing 
20 nucleotides depended on the melting temperature of the whole primer, and was determined for each 
prima' using the formulae: 

T _ = 4 (G+C)+ 2 (A+T) (I" 1 excluded) 

Tj » 54.9 + o.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70'C for the whole oligo and 
25 50-55°C for the hybridising region alone. 

Table I (page 487) shows the forward and reverse primers used for each amplification. In certain 
cases, it will be noted that the sequence of the primer does not exactly match the sequence in the 
ORF. When initial amplifications were performed, the complete 5' and/or 3' sequence was not 
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known for some meningococcal ORFs, although the corresponding sequences had been identified 
in gonococcus. For amplification, the gonococcal sequences could thus be used as the basis for 
primer design, altered to take account of codon preference. In particular, the following codons were 
changed: ATA-+ATT; TCG-+TCT; CAG^CAA; AAG->AAA; GAG->GAA; CGA->CGC; 
5 CGG-+CGC; GGG->GGC. Italicised nucleotides in Table I indicate such a change. It will be 
appreciated that, once the complete sequence has been identified, this approach is generally no 
longer necessary. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
in 2ml NH.OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
10 addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either 100^1 or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-lOpmol^l. 

C) Amplification 

The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
15 in the presence of 20-40^iM of each oligo, 400-800jiM dNTPs solution, lx PCR buffer (including 
1.5mM MgCU), 2.5 units Taql DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of 10^1 DMSO or 50nl 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
20 at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 


30 seconds 


30-60 seconds 
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95°C 


65-70°C 


72°C 



The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

5 The amplified DNA was either loaded directly on a 1 % agarose gel or first precipitated with ethanol 
and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30ul or 50ul of either water or lOmM Tris, pH 8.5. 

1 0 D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- NdeVXhol or NheVXhol for cloning into pET-21b+ and further expression of the protein 
as a C-terminus His-tag fusion 

15 - BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and further expression of the 

protein as N-terminus GST fusion. 

- For ORF 76, NheVBamM for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/Pstl. EcoRJ/Sall. Sall/PstI for cloning into pGex-His and further expression of 
20 the protein as N-terminus His-tag fusion 

Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40ul final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
25 50^1 of either water or 1 OmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 
1% agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lOug plasmid was double-digested with 50 units of each restriction enzyme in 200ul reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50ul of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD J60 of the sample, 
and adjusted to 50ug/ul. 1 pi of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
1 0 vector pTRC99 (Pharmacia). 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both pET22b 
and pGEX-KG. In a final volume of 20ul, a molar ratio of 3:1 fragment/vector was ligated using 0.5ul 
of NEB T4 DNA ligase (400 units/ul), in the presence of the buffer supplied by the manufacturer. 
15 The reaction was incubated at room temperature for 3 hours. In some experiments, ligation was 
performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's instructions. 

In order to introduce the recombinant plasmid in a suitable strain, lOOul E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
minutes, then, after adding 800pl LB broth, again at 37°C for 20 minutes. The cells were then 
20 centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 200pl 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + lOOug/ml 
ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
25 Miniprep Kit, following the manufacturer's instructions, to a final volume of 30ul. 5ul of each 
individual miniprep (approximately lg ) were digested with either NdeVXhol or BamHVXhol and 
the whole digestion loaded onto a 1-1 .5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Udder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 
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For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRVPstl cloning sites or, for ORFs 115 
& 127, EcoRl-SaR or, for ORF 122, Sall-Pstl. After cloning, the recombinant plasmids were 
introduced in the E.coli host W31 10. Individual clones were grown overnight at 37°C in L-broth 
5 with 50nl/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product, lfj.1 of each construct was used to transform 30^1 of E.coli 
BUI (pGEX vector), Ecoli TOP 10 (pTRC vector) or £.co/z BL21-DE3 (pET vector), as described 

10 above. In the case of the pGEX-His vector, the same E.coli strain (W31 10) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(100ng/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (100yg/ml) in 
1 00ml flasks, making sure that the OD^ ranged between 0. 1 and 0.1 5. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 

1 5 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifiiged in a microfuge, 

20 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifiiged at 6000# and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight Bacteria were 

25 diluted 1 :30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifiiged at 8000rpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifiiged again. 

30 The supernatant was collected and mixed with 1 50^1 Glutatione-Sepharose 4B resin (Pharmacia) 
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(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in lml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD M0 of 0.02-0.06. The GST-fusion 

5 protein was eluted by addition of 700pl cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 2M was 0.1. 21ul of each fraction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 116.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21 .5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 

10 be added to the MW of each GST-fusion protein. 

1) His-fusion solubility analysis (ORFs 111-129) 

To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500ul PBS pH 7.2]. 25ul lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 

15 Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0.1 M NaH, P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 

20 at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 1 13, 1 19 and 120 were found to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

25 A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD JS0 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 

30 the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
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buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

5 For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 2ml 
buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) and 
treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpra for 40 minutes. 

Supematants were collected and mixed with 150ul Ni J *-resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
1 0 for 30 minutes. The sample was centrifuged at 700g for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
15 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700pJ of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the OX> 2B0 was 0.1. 21ul of each 
20 fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

1 0% glycerol was added to the denatured proteins. The proteins were then diluted to 20ug/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
25 14 hours. The protein was further dialysed against dialysis buffer U (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1 .55 x ODg - (0.76 x ODj*) 
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L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
5 on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20ng of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1,21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, GDI mice 
15 were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 620 . The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. 100^1 bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
buffer (0.1% Tween-20 in PBS). 200^1 of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 

30 water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 



9NS0OC1D: <WO_9924578A2_L» 



WO 99/24578 PCT/IB98/01665 

-58- 

three times with PBT. 200^1 of diluted sera (Dilution buffer. 1% BSA, 0.1% Tween-20, 0.1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. lOOjil of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
5 37°C. Wells were washed three times with PBT buffer. lOOjil of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phenildiamine and 10nl of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. 100|al H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 
pre-immune sera. 

10 O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria were 

15 let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD^ of 0.07. 100[d bacterial cells were added to each well of a Costar 96 well 
plate. 100^1 of diluted (1:200) sera (in blocking buffer) were added to each well and plates 

20 incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200^1/well of blocking buffer in each well. lOOjal of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1 :100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200jil/well of blocking buffer. The supernatant was aspirated and cells 

25 resuspended in 200^1/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 
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P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 20mM 
Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted by 
sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed by 

5 centrifugation at 5000s for 10 minutes and the total cell envelope fraction recovered by centrifugation 
at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from the crude outer 
membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and incubated at room 
temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 minutes to remove 
aggregates, and the supernatant further ultracentrifuged at 50000g for 75 minutes to pellet the outer 

10 membranes. The outer membranes were resuspended in lOmM Tris-HCl, pH8 and the protein 
concentration measured by the Bio-Rad Protein assay, using BSA as a standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

15 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5ug) and total cell extracts (25ug) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 150mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1 .44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 

20 at 4°C in saturation buffer (10% skimmed milk, 0.1% Triton X100 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton XI 00 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1:200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1:2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton X100 in PBS and developed with 

25 the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD 6J0 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
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tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 620 of 0.5, diluted 
1:20000 in Gey's buffer and stored at 25°C. 

50^1 of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25nl of 
5 diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 

incubated at 4°C. 25pJ of the previously described bacterial suspension were added to each well. 

25pl of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 

were added to each well. Immediately after the addition of the baby rabbit complement, 22^1 of 

each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
10 incubated for 1 hour at 37°C with rotation and then 22pl of each sample/well were plated on 

Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 

0 and time 1 hour were counted. 

Table II (page 493) gives a summary of the cloning, expression and prurification results. 
Example 1 

15 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA . AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A . GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT . TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

20 201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGC.GTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

25 451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 

501 AGACCG... 

This corresponds to the amino acid sequence <SEQ ID 2; ORF37>: 

1 MKQTVXMLAA AL1ALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

30 101 VRWYRQAAAO GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 3>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

35 101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

40 351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 
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This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 

1 M KQTVKWLAA ALIAL GL NRA VWA PDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

5 151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A ofN. meningitidis <SEQ ID 5>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

in 151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 

i MKQTVKWLAA ALIALGLNQA VWA DDVSDFR EKLQAAAQGK AAAQNNLGVM 
15 b l YAERRGVRQD RALAQEWLGK ACQNGYQDSC DNDQRLKAGY * 

The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

90 nrf37 oeo MKOTVXMLAA ALIALGLNRPVTO DDVSDFRENLXAAAQGNAAAQYKLGAMYXQRTRVRRD 

zu orn .pep — j-nTTTTTTTn n iiiihmm 1 1 1 n 1 1 1 1 1 m i : n = i mi 

orf37a MKOTVKMIAAALIAIjGLNQAWA DDVSDFRENLQAAAQGNAAAQNNI^VMYAERRGVRQD 
10 20 30 40 50 60 

<yc 70 80 90 100 110 120 

or f 3 1 . pep DAEAVRW yrqpaeqglaqaqynlgwmyangrxvrqddteavrwyrqaaaqgwqaqynlg 

||:| : : : I 
orf37a raLAQEWLGKACQNGYQDSCDNDQRLKAGYX 
70 80 90 

30 Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 7 >: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGqcgocaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

35 201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

40 1 MKQT VKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 

5! YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 llaa 



45 



50 



55 



overlap with ORF37ng: 




orf37.pep 


MKQTVXMIAAALIAI^LNRPVWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 

inn Minium: n m i u 1 1 1 1 1 1 m i m i t = 1 1 1 : w • ■ I - ■ 

KKQT VKW LAAAL I ALGLN QAVW AGDVS D FRENLQ AAE QGN AAAQ FN LG VMYEN GQG VRQ D 


60 


orf 37ng 


60 


orf 37 .pep 


DAEAVRWYRQPAEQGIAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYN^ 

::||:|||: :MI I 1 1 1 1 1 1 1 II : II 1 1 1 1 ' 1 '\ -1 : > 
YVQAVQWYRKASEQGDAQAQ YN LGLMYYDGRG VRQDLALAQQWLGKACQNGDQN SCDNDQ 


120 


orf 37ng 


120 


orf 37 .pep 


VIYAEGRGVRQDDVEAVRWFROAAAQGVAQAQNNLGVMYAERXRVRQD 1 68 




orf 37ng 


RLKAGY 126 
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The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overlap: 



orf 37-1. pep 
orf37ng 



10 20 30 40 50 60 

MKQTVKW LAAAL I ALG LN RAVW AD DV S D FREN LQAAAQGN AAAQ YN LGAMYYKGRG VRR D 

!lM!iitlll|||||||:ltll i Illllt I I I I I M : M I : II 

MKQTVKWIAAALIAI^LNQAWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 

10 20 30 40 50 60 



70 80 90 100 HO 120 

orf 37-1 pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGVVQAQYNLG 

10 ' :: 11:111: 1:1 II II III HI •> Mil Mil 

or f37nq YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

70 80 ^0 

130 140 150 160 170 180 
15 orf 37-1 pep V I YAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMY AERRGVRQDRALAQEW FGKAC 
lJ * F II 11:1: MM 

orf37ng LALAQQWLGKAC 

20 i9 ° i" 

or f 37-1. pep QNGDQDGCDNDQRLKAGYX 
M II I : : M M II M I M I 
orf37ng QNGDQNSCDNDQRLKAGYX 
110 120 

25 Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
30 1 A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in E.colL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

35 Figure IE shows plots of hydrophilicity, antigenic index, and AMPHQ regions for ORF37-1. 
Example 2 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 

GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 
40 TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 

ACGTTT CCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 

GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 

TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
45 GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 

1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 
51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 
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101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 

Homolo gy with a hypothetical H.influew r^trin fvbrd.haein: accession number p45029) 

SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

on 30 40 50 60 70 

yrbd. h l/;iGALVFl£l*VANVQGFAETKSYTVTA^ 

' " KVNAPVKSAGVLVGRVGAIGLD? 

10 20 30 



10 



25 



30 



35 



FGDIGGLKVNAPVKSAGVLVGRVGAIGLD? 
N.Itl , A nr-. in 



on 90 100 11C 120 130 

yrbd h KSYLPKX'SIAINQEYNEIPENSSLSIKTSGLLGEQYIALTMGFDDGDTAMLKNGSOIQDT 

' i | I . • i :::::: I : : : : : I I I 1 1 11 I I I M : I I I I I : I i : I • 
N m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDTISVT 
J5 40 50 60 *?0 80 

140 150 160 

yrbd . h TSAMVLEDLIGQFL — YGSKKSDGNEKSESTEQ 
: | | I | | I : H I : I : : : : I : : I I : : : : : : I : 
90 N m S SAMVLEN LI GKFMT S FAEKNADGGNAEKAAEX 

90 100 HO 120 

Hnmnlogy with a predicted OR F from N gonorrhoeae 

SEQ ID 9 shows 99.2% identity over a 1 18aa overlap with a predicted ORF from N. gonorrhoeae: 



20 30 40 50 60 70 

yrbd GAAAVAr IJVFRVAGGAAFGGSDKTYAVYADFGDIGG^ 

FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
N ' m 10 20 30 

90 100 110 120 130 

yrbd KS 



N.m 



80 90 100 110 AJU 

SVOARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

. 1 77 1 1 1 1 | t 1 1 1 1 II I 1 1 I I 1 1 1 1 M 1 I I I I I I M 1 1 U M M I I M I I ! 1 1 1 1 1 I M I 
KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
40 50 60 70 B0 90 



140 150 160 

vrbd VLENLIGKFMTSFAEKNAEGGNAEKAAEX 
I I III I I I I I I 1 ! I M I I : M I M I I M I 
Aft M m VLEN LI GKFMT S FAEKNADGGNAEKAAEX 

HU 100 110 120 

The complete yrbd H.influenzae sequence has a leader sequence and it is expected that the full- 
length homologous ^meningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
45 epitopes, could be a useful antigen for vaccines or diagnostics. 

Example 3 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1 1>: 

1 ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 **TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

eft 10 1 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

JXJ 151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

201 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

251 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

301 CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
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351 GCGCAACGCg CTTTCGTGGG ACGAAAAATT CGCCTGCGAT GTTTGGTATA 

401 TCGACCACTT CAGCCTGTGC CTCGACATCA AAATCCTACT GCTGACGGTT 

451 AAAAAAGTAT TAATCAAGGA AGGGATTTCC GCACAGGGCG AACA.aCCAT 

501 GCCCCCTTTC ACAGGAAAAC GCAAACTCGC CGTCGTCGGT GCGGGCGGAC 

551 ACGGAAAAGT CGTTGCCGAC CTTGCCGCCG CACTCGGCCG GTACAGGGAA 

601 ATCGTTTTTC TGGACGACCG CGCACAAGGC AGCGTCAACG GCTTTTCCGT 

651 CATCGGCACG ACGCTGCTGC TTGAAAACAG TTTATCGCCC GAACAATACG 

701 ACGTCGCCGT CGCCGTCGGC AACAACCGCA TCCGCCGCCA AATCGCCGAA 

751 AAAGCCGCCG CGCTCGGCTT CGCCCTGCCC GTACTGGTTC ATCCGGACGC 

801 GACCGTCTCG CCTTCTGCAA CAGTCGGACA AGGCAGCGTC GTTATGGCGA 

851 AAGCGGTCG. . 

This corresponds to the amino acid sequence <SEQ ID 12; ORF3>: 

1 . . ILIYLI RKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 

51 ERLTPFGKKL RAASXDELPE LWNILKGEMS LVGPRPLLMQ YLPLYDNFQN 

101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 

151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 

201 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 

251 KAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV. . 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 

201 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTGAATT ATGGAATATC 

301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAGGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 

651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 

701 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 

1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 

1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This corresponds to the amino acid sequence <SEQ ID 14; ORF3-l>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAWVRDVS DGMTVAGNPA 

401 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of M 



meningitidis: 



10 20 30 
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„ rf3 D eo ILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
° r£ P P TnTiTM M HI Mil I II || mi Ml I II I H 

0 rf3a mstcpp^t^tvaS ASGLIFLSPVFLILIYLI RKWLGSPVFFFQERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

5 40 50 60 70 80 90 

orf3 pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
11:!:, MIi HIM IMNIlllllll | | I I I I M M I I : I I I I I I I I I M M I M 
orf3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
10 70 80 90 100 HO 120 

100 HO 120 130 140 150 

o-f3 pep YmTTOTPmiFHKP^- TT ™* n ™rcPN&T<;Mnv^ 
~ | | | | | | | | | | | I I I I I M I I I I I I I I I I I II : I M I H I I I I I I 1 I I I I I I I I I I I I I I I 

1 < orf 3a Y PttFCflrJHFMr p ^ T ^ w * n ™ r » RM *^^^ 

130 140 150 160 170 180 

160 170 180 190 200 210 

orf 3 Pep IKEGISAQGEXTMPPFTGKRK1AWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
20 Tlllllllll t 1 1 i 1 1 1 I I I M I 1 1 1 M 1 1 I I M : I I I I M I I I 1 I I I I I: II I I M 

orf 3a iKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
190 200 210 220 230 240 

220 230 240 250 260 270 

7S orf3 oeo FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

* - | || M Mil I I II Mil: I: milMIIIMIIMMI Mil III Mill MM I II I- 

orf 3a "PVIGTTliLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
250 260 270 280 290 300 

30 280 

or f 3 . pep VGQGSWMAKAV 

orf 3a vgOGGWMwJvW^ 

310 320 330 340 350 360 

35 The complete length 0RF3a nucleotide sequence <SEQ ID 1 5> is: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

40 201 TTCAGACGGC ATTCTGCTGC CCGACGGA3A ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

45 4 5i GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

50 701 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

55 951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTG5 ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

60 1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 16>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LIR KNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

65 i5i ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATKPPFTGKR 

201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 
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301 VGQGGWMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GKTRIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 
401 KPLAGKNTET LRS* 

Two transmembrane domains are underlined 

ORF 3-1 shows 94.6% identity in 410 aa overlap with ORF3a: 

10 20 30 40 50 60 

orf3a.pep mskffkrlfdivasasgliflspvfliliylirknlgspvfffqerpgkdgkpfkmvkfr 

{ I I I M M t I I i I M i I I I I I I I I i I 1 I I I I ! I ! I I I M 1 I I M I I I t M M I I M I 1 I I 

orf 3-1 mskffkrlfdivasasgliflspvfliliylirknlgspvfffqerpgkdgkpfkmvkfr 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 3a . pep SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
M:MII1III I II I I I I I II I I I I II II 1 I I I I I I I I : I II : II I I II I I II I I I I I » 
or f3-l SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 

I I M | I I II I i I I I t I I i I I II I I I I I I I I I : I I I I : I I 11 I I I I I I I I I I I i I I I I I I I 
orf 3-1 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf3apep I KEG I SAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGT YGE I VFLDDRVQGSVNG 

II || 1 I I I I I I I I I I I I II II I I I I I I I I II I I I I : I I I I I I I 11111111:111111 
orf 3-1 I KEG I S AQGE ATMP P FTGKRKLAWGAGGHGKWADLAAALGRYRE I V FL DDRAQGSVNG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 3a . pep FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
I | I I I I I I t I I I I I I I I : I : I I I I 1 I I I I I 1 I I I I I I I I M I I 1 1 I I : I I I : I II I I I I 
orf 3-1 FS V IGTT LLLEN SLSPEQYDVAVAVGNNR I RRQI AEKAAALG FAL PVLVH PDATVS PS AT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 3a . pep VGQGGVVMAKAWQADSVLKDGVI VNTAATVDHDCLLDABVH1 S PGAHLSGNTRIGEESW 

I I I! : I I I I I I I I I I I I t I I I I I I I I I I 1 I I I I I I I : I I 11 I M I M I I I I I : I I I I t 1 
orf 3-1 VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

or f 3a . pep IGTGACSRQQIRIGSRATIGAGAWVRDVSDGMTVAGNPAKPLAGKNTETLRSX 

t I I I I I I I 1 1 I I I I I I I M 1 1 1 I 1 I 1 I 1 1 1 I I 1 I I I 1 I 1 1 1 1 I H M 
orf 3-1 IGTGACSRQQIRIGSRATIGAGAWVRDVSDGMTVAGNPAKPLPRKNPETSTAX 

370 380 390 400 410 

Homology with hypothetical protein encoded bv wfc gene ( accession Z71928^ of B. subtilis 
ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

0RF3 3 IYLIRKNI^SPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 

I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
yvfc 27 IAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 

ORF3 63 ASXDELPELWNIIiCGEMSLVGPRPLLMQYLPLYDNFQNRRHEMKPGITGWAQVNGRNALS 122 

S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 
yvfc 87 LSI DELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQINGRNAI S 146 

ORF3 123 WDEKFACDVWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 172 

W++KF DVWY+D++S LD EGI T FTG 

yvfc 147 WEKKFELDWYVDNWSFFLDLKIUrLTVRKVLVSEGIQQTNHVTAERFTG 196 
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Homology v ^th a predicted HPP rj gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 
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25 



gonorrhoeae: 

orf3 
orf 3ng 
orf3 
orf3ng 
orf3 
orf 3ng 
orf3 
orf 3ng 
orf3 
orf 3ng 
orf 3 
orf3ng 



ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
: I I I I I I I I I I M I I : : II I I I I I I I I I II I LI 
MSKAVKRLFDIIASA.qrnT.TVLSPVFLVLIYLIRKNKGSPVFFIRERPGKDGKPFKMVKFR 



34 



60 



94 



SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I I I : I 11111111:111! 1111111:1 II II I I I I : M I I I II II II I I I I I I I I I 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 



YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 
I :: I I I I I I I I I I I 1 I I I I I I M I I I ! I M I M : M I I i |:|j: I I : I I I : I I II I I I 
YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 



154 



180 



214 



IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
I I I I II I I I I I I I I I : I : I I I I I : I I I i I I I i I I : I i I I It I I I I I I I I I : I I I I I I 
I KEG I S AQGEATM P P FAGN RKLAV I G AGGH GKWAE LAAALGT YGE I VFLD DRTQG S VNG 240 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 
I II I II II 1 I I I I I I I I : I : : I I I I I I I I I I I I : I : I I II I I I II I : I I I I I I I I I I 
FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

VGQG S WMAKAV 286 
: I I I I I I I I I I I 

IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 



The complete length ORF3ng nucleotide sequence <SEQ ID 17> is: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAGTAAAG 
GCTGATTGTC 
AAAACTTAGG 
ggaaaacCTT 
TTCAGACGGC 
AAAAATTACG 
CTCAAAGGCG 
TCTGCCGCTT 
GCATTACCGG 
GAAAAGTTCT 
GGATATGAAA 
GCATTTCGGC 
AAACTCGCCG 
TGCCGCCGCA 
CCCAAGGCAG 
GAAAACAGTT 
CAACCGCATC 
AACTGCCCGT 
ATCGGACAAG 
CGTATTGAAA 
ACTGCCTGCT 
GGCAACACGC 
CCGCCAGCAG 
TTATCGTATG 
AAGCCCCTTA 



CCGTCAAACG 
CTGTCGCCCG 
TTCGCCCGTC 
TTAAAATGGT 
ATTCCGCTGC 
CGCCACCAGT 
AGATGAGCCT 
TACAACAAAT 
CTGGGCGCAG 
CCTGCGATGT 
ATCCTGTTTC 
GCAAGGGGAA 
TTATCGGCGC 
CTCGGCACAT 
CGTCAACGGC 
TATCGCCCGA 
CGCCGCCAAA 
TCTGATTCAT 
GCAGCGTCGT 
GACGGCGTGA 
TGACGCTTTC 
GTATCGGCGA 
ACAACCGTCG 
CGACATCCCG 
CGGGCAAAAA 



CCTGTTCGAC 
TGTTTTTGGT 
TTCTTCattC 
CAAATTCCGT 
CCGATAGCGA 
TTGGACGAAC 
GGTCGGCCCC 
TTCAAAACCG 
GTCAACGGGC 
TTGGTACACC 
TGACAGTCAA 
GCCACCATGC 
GGGCGGACAC 
ACGGCGAAAT 
TTCCCCGTCA 
ACAATTCGAC 
TCACCGAAAA 
CCCGACGCGA 
AATGGCGAAA 
TTGTGAACAC 
GtccaCATCA 
AGAAAGCCGG 
GCAGCGGGGT 
GACGGCATGA 
CCCCAAGACC 



ATCATCGCAT 
TTTAATATAC 
GGGAACGCCc 
TCCAtgcgcg 
ACGCCTGACC 
TTCCTGAATT 
CGCCCGCTTT 
CCGCCACGAA 
GCAACGCGCT 
GACAATT7CA 
AAAAGTCTTG 
CCCCTTTCGC 
GGCAAAGTCG 
CGTTTTTCTG 
TCGGCACGAC 
ATCACCGTCG 
CGCCGCCGCG 
CCGTCTCGCC 
GCCGTCGTAC 
TGCCGCCACC 
GCCCGGGCGC 
ATAGGCACGG 
TACCgccgGT 
CCGTCGCGGG 
GGGACGGCAT 



CCGCATCGGG 
CTCATCCGCA 
cgGAAAGGAc 
acgcgcttGA 
GATTTCGGCA 
ATGGAATGTC 
TGATGCAGTA 
ATGAAACCGG 
TTCGTGGGAC 
GCTTTTGGCT 
ATTAAAGAAG 
GGGGAATCGC 
TTGCCGAGCT 
GACGACCGCA 
GCTGCTGCTT 
CCGTCGGCAA 
CTCGGCTTCA 
TTCTGCAATA 
AGGCCGGCAG 
GTCGATCACG 
GCACCTGTCG 
GCGCGTGCAG 
GCAGGGgcGG 
CAACCCGGCA 
AA 



This encodes a protein having amino acid sequence <SEQ ED 18>: 
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51 
101 
151 
201 
251 
301 
351 
401 



MSKAVKRLFD 
GKPFKMVKFR 
LKGEMSLVGP 
EKFSCDVWYT 
KLAVIGAGGH 
ENSLSPEQFD 
IGQGSWMAK 
GNTRIGEESR 
KPLTGKNPKT 



IIASASGLIV LSPVFLVLIY 



SMRDALDSDG 
RPLLMQYLPL 
DNF5FWLDMK 
GKWAE LAAA 
ITVAVGNNRI 
AWQAGSVLK 
IGTGACSRQQ 
GTA* 



IPLPDSERLT 
YNKFQNRRHE 
ILFLTVKKVL 
LGTYGEIVFL 
RRQITENAAA 
DGVIVNTAAT 
TTVGSGVTAG 



LIRKNLGSPV 
DFGKKLRATS 
MKPGITGWAQ 
IKEGISAQGE 
DDRTQGSVNG 
LGFKLPVLIH 
VDHDCLLDAF 
AGAVIVCDIP 



FFIRERPGKD 
LDELPELWNV 
VNGRNALSWD 
ATMPPFAGNR 
FPVIGTTLLL 
PDATVSPSAI 
VHISPGAHLS 
DGMTVAGNPA 
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This protein shows 86.9% identity in 413 aa overlap with ORF3-1: 
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orf 3-1. pep 
orf 3ng 

orf 3-1. pep 
orf3ng 

orf 3-1. pep 
orf3ng 

orf 3-1. pep 
orf 3ng 

orf 3-1. pep 
orf3ng 

orf 3-1. pep 
orf3ng 



10 20 30 40 50 60 

MSKFn<RI.FDIVASASGLIFLSPVFLILlYLIRKNWSPVFFFQERPGKDGKP^Kre 

.Ti ll I I 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 I ■ ' I » 1 1 m 1 1 1 1 I I I M 1 ' ' iUl'iii 
^V^LFMIASASG^ 

10 20 30 40 ^ u 
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70 80 90 100 HO 120 

SM RD/%LDSDGIPLPDGERLTPFGKKLRAASL^ 

70 80 9° 100 



130 140 150 160 no 180 

YDNFONRRHEMK^ITGWAQVUGRNALSWDEKFAC^ 

,. . ,7, , I, I II I I I | I I I I i I I I I I | | |l II : I I III I: II: I I : I I I • II I I I I I 
YNKFQNR^EMKPGITGWAQ^ 

130 140 150 160 1'0 XDU 

190 200 210 220 230 240 

IKEGISAQGEATMPPFTGKRKIAWGAGGHGKWADIAAA 

250 260 270 280 290 300 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAA^ 

i till I I II 11111111:1:: IMMIIIIIIIsIs Mill I I I I : I I I 1 1 M Ml 
FPVIGTTLLLENSLSPE^ 



250 
310 



260 
320 



270 
330 



280 
340 



290 
350 



300 
360 



310 320 330 340 350 360 

51,1 W^wi^ 

310 320 330 340 350 360 



IGQGS' 



310 
370 



320 330 340 350 

370 380 390 400 410 

orf3ng IGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGNPA^PLTG 

In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis 



biosynthesis (Bacillus subtilis] Length = 202 
Score - 235 bits (594), Expect = 3e-61 
Identities - 114/195 (58%), Positives = 142/195 (72%) 



S Q 113/173 uot> ; / — • 

Sbjcf 3 U^FDLTARIFLLCCTSVIILmAWBlKIGSPVFFKQVRPGLHGKPETLYKFBTMTD 

— »■ » ^^^^^^^^^^^ z 

__?r..I... rl^™, TotrT cTnri.pnT.T.NVl.KGDLSLVGPRPLLMDYLPLYTEK 122 



Query: 5 
Sbjct: 3 



Query: bD ^^^I" RLT G+ +R S +DELP+L NVLKG++SLVGPRPLLM YLPLY + 
Sbjct: 63 nJMM 

Sbjct: 123 S^SSJt 

Query: 185 I S AQGEATM PP FAGN 199 

I T F G+ 

Sbjct: 183 IQQTNHVTAERFTGS 197 



122 
184 
182 
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The hypothetical product of yyfc gene shows similarity to EXOY of R.melilotU 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N. gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 19>: 

1 . .AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT . GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; ORF5>: 

1 . .NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKS PYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

4 51 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 



1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFG? EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further work identified the corresponding gene in strain A of N.meningitidis <SEQ ID 23 >: 



1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 
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TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 caSScc atatggcaat cgtcatcgac gaatacggcg gcacgtcggg 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC gCATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

< 601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

5 115 GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGCCGAAA mgtcnttat 

751 cggcghnttg canttcacng tcgccngcgc ngacaaccgc cgcctgcata 
boi cgSggc gacccgcgtg aagtaagctc cgccgtttct gtacagttta 
10 851 SSgS acgggcgttt tctgtttcaa tccgccccat ccgccanaca 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 

1 MDGAOPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
,c 10 : kdevlgiSa KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

15 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 

201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 
251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

20 The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 

10 20 30 

NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

otf5 -P ep Mill Mill 1 I I 1 1 I I I 1 I Mill 111:1 

25 orf5a FHLKSILRPAVFVPEGK^LTALLKEFR^QRNHMMVIDEYGGTSGLV^FEDIIEOIVGDI 

40 50 60 70 80 90 

orfS pep EDEFDEDDSADNIHAVSSDTVWIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
orfS.pep 1 1 1 1| 1 1 • | 1 1 1 1 1 1 1 1 : : 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 I I III II III HI I 
30 orf5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 

190 200 210 220 230 240 



100 HO 120 130 

35 or£5.pep RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

orfSa JJJJJoxliJalx^^ 

250 260 270 280 290 JUU 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 

40 



otf5a.pep 



10 20 30 40 50 60 

orf5-l iiGAQpUiFFERilAR^PDSAEDV^LLRQAH^ 

45 7Q 80 90 100 110 120 
or.5a.pep 

orf5-l ro^rS^™ 

jQ -70 80 90 100 110 1ZO 

130 140 150 160 170 180 

55 otf5-l EiF^tiipAVFVPEGKSLiALUCEFREQRNHMAIVIDEYGGTSGm 

JJ * w 130 140 150 160 170 180 

190 200 210 220 230 240 
orfSa pep DIEDEFDEDESAONIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 

orfSa.pep DIEDErufcoe. I I 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 I M 1 1 1 1 1 1 III :ll 

60 orf5 , eIedefdeddsadni^vsserwrihaateiedintffgteysseiadtirp-ghsrvgt 

° r£5 1 190 200 210 220 230 

250 260 270 280 290 300 
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orf 5a . pep pararrksxyrrxaxhxrxrxqpppayadgdprevssavsvqfrmtvrafsvsirpirxt 

M til I I |:l I I I I < I I t I I 1 I 1 I I = I I i r I t t I 1 i I 1 I 1 I t 1 I I I 1 I 

or f 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 
240 250 260 270 2 80 290 

5 Further work identified the a partial DNA sequence in ^gonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

10 151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ED 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

15 51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

20 301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

401 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGT GACATCGAAG 

25 551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

7 01 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

30 801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 

851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 

901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

35 51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 

40 301 IRQT* 

The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORFSng): 

orfS NHMAIVIDEYGGTSGLVTFEDI IEQIVGEI 30 

I I I I M I i I I I 1 1 1 I I I I M II I I M I I: I 
45 orfSng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

orf5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

II M I I I : I I I : I I: I I:: I I I I I I I I I I II I : I I I I I I : I I I I I I I 111 : I I I 
orf5ng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

50 

orf 5 RARRKS PYRRFAVHRRTRRQPPPAYADG D PREVS X RRFCTV 131 

I I I I I I I I II I I I I I I 1111111:111111111 I I I I I I 

orfSng RARRKS PYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 2B7 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
55 304 aa overlap: 

10 20 30 40 50 60 

orfSng- 1 . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLE KVLDFAELEV 
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orf5ng-l.pep 



° 10 20 30 40 50 ou 

70 80 90 100 HO 120 

RDAMITRSRMNVLKENDSIERITAWIDTAHSRFPVIGEDKDEVWlUiAKDLI^^P 

ori ° 70 80 90 10° HO t 20 

10 130 140 150 160 170 180 

orf5nq-l.pep EQFHliCSVIJlPAVFVPEGKSLTALLKEFREQRNHMAIVIDEY 

orf5ng .pep j j | | 1 1 | : | 1 1 | | | I I I I I I I I 1 1 U I I 1 I I II I M 1 I I I II I 1 1 I I I I I Ml I 1 1 1 1 1 1 

orfS-1 EQraLKSII^PAVEVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDII^ 
15 or " 130 140 150 160 170 180 

190 200 210 220 230 240 

DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSG1GT 

. I I 1 1 1 1 1 I : | 1 1 : I I : I 1 : 1 1 1 1 I 1 1 I I M I I I I : I 1 1 I 1 1 : I I I I I I I I IN = jl 
EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 



orf5ng-l .pep 
20 orf5-l 



250 260 270 280 290 300 

orf 5ng-l . pep PARARRKS PYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRS FSV S IRP 

25 mi in linn i mi mini: mm in imiiiiiiimhiiiiii 

" f5-1 S ARARRKS P YRRFAVHRRTRRQP PPAYADGDPREV S TAVSAQFRMTVRAFSVSIRP 

240 250 260 270 280 290 



30 or f 5ng-l . pep IRQTX 

1 1 1 n 

orf5-l IRQTX 
300 

Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
35 identified the following homologies: 

Hnmnlogy with hemolysin homolo g TlvC (accession U32716) of ^influenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 

ORF5 2 HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 
HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 
40 xiyc 166 HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDD 224 

ORF5 62 INTFFGTEYSIEEADTI 78 

N F T++ EE DTI 
TlyC 225 FNAQFNTDFDDEEVDTI 241 

45 ORF5ng-l also shows significant homology with TlyC: 

SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

50 orf5ng-l pep mdgaqpktofferliarlar-epdsaedvlnllrqaheqevfdadtltrlek 

I I I : I : : I : : I : I :::::: I :::::::: K si : I 
tlvc haein mndeqqnsnqsentkkpffqslfgrffqgelknreelvevirdseqndlidqntremieg 

- 10 20 30 40 50 60 

55 60 70 80 90 100 109 

orf5na-l oep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE DKDEVLGILH 
9 | : : : | | I : III 1 1 II s s :::::::: : I : : I I I I I I II : : I : I s : : II II 

tlvc haein V^ffilAELRVRDIMIPRSQIIFIEDQQDLNTCLNTIIESAHSRFPVIADADDRDNIVGILH 
3 - 70 80 90 100 110 120 



HO 120 130 140 150 160 

orf5na-l oep AKDIJJCYMF-NPEQFHLKSVXRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 
9 Mllll:: : I I I s 1 : 1 1 1 : 1 : 1 1 1 : I : : 1 1 : II : I I II II : II : I : : I 1 1 

tlyc haein AKDLUCFIAEDAEVTDLSSLIAPWIVPESKRVDRMUCDFRSERFHMAI^ 
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10 



130 140 



150 160 HO 180 



170 180 190 200 210 220 

orf5na-l pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFPGTEYGSEEAD 

tlvc haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFKTDFDDEEVD 
tiyC - n 190 200 210 220 230 

230 240 250 260 270 280 

orf 5ng- 1 . pep TIRRLGHSGIG-TPARARWSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 

tlvc haein TlGGLIMOTrcYLPKRGEEIII^ 

tiycjiaem ^ ^ 2?Q 28Q 29Q 



15 Homology with a hypothetical secreted pro tein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 

SPIP773921YBEX ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>ai 1 1 778577 (U82598) similar to H. influenzae [Eschericnia coll) >gi 1 1786819 
(M000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
20 approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli) Length - 292 

Score = 212 bits (533), Expect - 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps = 3/230 (1%) 

?S Ouerv 2 DGAQPKTNFXXRLIARIJ^-EPDSAEDVLTLLRQAHEQEVFDADTLIJU^KVLDFSDLEV 60 

AJ uuety< D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 

Sbjct: 10 DTISNKKGFFSLLLSQLFHGEPKNRDELIALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Ouerv 61 RDAFITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM-FN 119 
*lf\ v jr- RD m RS +M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

Sbjct: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Ouerv 120 PEOFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 179 
yuery. x ^ + ^ ^ ^ ^ + +LK£FR QR hmaxvxde+gg SGLVT EDI+E IV 
35 Sbjct: 130 AEAFSMDKVLRQAWVPESKRVDRMLKEFRSQRYHMAIVI DEFGGVSGLVTIEDI LELIV 189 

Ouerv 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
Sbjct: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

40 Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
N.meningitidis and N. gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as described above. The 
45 products of protein expression and purification were analyzed by SDS-PAGE. Figure 2A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 5 

50 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 29>: 

1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 
51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 
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101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGA 2 GGCATG GGAAAGCAGG CAGGACGGGC 

5 301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

401 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

451 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

10 551 GATTGCGCTG CCC. . 

This corresponds to the amino acid sequence <SEQ ID 30; ORF7>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

J 5 151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP. . 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

90 151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

25 401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

30 651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

35 901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

! MLRKL LKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

40 101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

45 Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded bv v ce? gene (accession P44270) of Kinfluenzae 
ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G + V+ IEG F RK ++ P + K SNE++ A ++ + 

50 yceg 102 I^SGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 



55 



ORF7 56 NPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLV 115 
N EG +PD+Y +DL++ + 4 + M++ LN+AW R + LP NPYEMLI+A +V 

162 NVEGWLYPDTYNYTPKSTDLELIiCRSAERMKKALNKAWNERDEDLPLANPYEMLIlA 221 



yceg 



ORF7 116 EKETGHEAXXDHVASVFVNRIJCIGMRLQTXXSVIYGMGAAYKGK 175 
EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

222 EKETGIANERAKVASVFINRLKA^KLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 



yceg 
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ORF7 176 RGGLPPTPIALP 187 

GLPPTPXA+P 
yceg 282 IDGLPPTPIAMP 293 



The complete length YCEG protein has sequence: 



1 Mifvn.T ATLL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQ? NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPD T GQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

in 251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 

Homology with a predicted ORF from Njnen ingjMi (strain M 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of N. 
15 meningitidis: 



or f 7. pep 
orf7a 



10 20 30 

MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 
I I I I t I 1 1 1 I I t 1 I t I t I 1 I I I I I t I I t 1 I 
AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDATP 



2 0 70 80 90 100 110 120 



orf7.pep 
25 orf7a 



40 50 60 70 80 90 

DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 

, , | | | | | | | | | | II I I I I I I I I I II I I I I i I I I I I t I I I I I I I I : I I I I I I I i n I I I 
D IEHDT KGWSNEKLMAEVAPDAFSGN PEGQFFPDS YE I DAGG S DLRI YQI AYKAMQRRLN 
130 140 150 160 170 180 



100 110 120 130 140 150 

nrf7 DeD EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 

30 ' P P | III ill) I III! III IIMII I: M II I M I 1 1 I I I I I I I I I I I I I I 1 1 I III! 

nrflA eawESRODGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 
190 200 210 220 230 240 

160 170 180 

orf7 pep GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALP 
| 1 1 | | I I 1 I 1 1 I I I I I I I 1 I I I t I 1 1 t I t I I I I 1 I I I 
nrf7a GMGAAYKGKIRKADLRRDT PYKT YTRGGLP PT PIALPGKAALDAAAHPSGEKYLYFVSKM 

250 260 270 280 290 300 

40 orf7a DGTGLSQFSHDLTEHNAAVRKYILKKX 

310 320 330 

The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

45 ioi ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

50 351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

55 601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

60 851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 
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This is pred ted to encode a protein having amino acid sequence <SEQ ID 34>: 

1 MLRKLLKWSA vrT.TVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

5 151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

A leader peptide is underlined. 
10 ORF7a and ORF7-1 show 98.8% identity in 331 aa overlap: 

10 20 30 40 50 60 

orf "7a . pep MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRJVFSR 
I II II I I I I II II I II I I I I I I II I I I I I I I I I II I I I I I I M » I I I I I I I I I I M I I I I 
orf 7-1 MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
15 10 20 30 40 50 60 

70 80 90 100 110 120 

orf "7a . pep HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 
I I I I I I I I I 1 1 I I I I I I I I I I I I 1 I I I 1 I I I i I I I I I 1 I I t 1 I I I 1 I I I I I I I I t 1 i 1 t I 
20 orf 7-1 HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7a . pep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 
25 I I I I I I I I I I I II I I I I I I I I I II II I I I I I I I I I II I I I I I I I I I I I I : I I I I II I I 

orf 7-1 IDATPDIGHDTKGWSKEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAM 

130 140 150 160 170 180 

190 200 210 220 230 240 

30 or f 7a . pep QRRLNEAWES RQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 

I 1 I I I II I I I II II I I I I I II I I I I I I I I : I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
or f 7 - 1 QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 
190 200 210 220 230 240 

35 250 260 270 280 290 300 

orf 7a . pep PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I II I I I I I I I II I I I II II I I I I I I I I II 
orf 7-1 PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
250 260 270 280 290 300 

40 

310 320 330 

orf 7a . pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I 
o r f 7 - 1 FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
45 310 320 330 

Homology with a predicted ORF from N^onorrhoeae 

ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from N. 
gonorrhoeae: 

50 orf 7 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

I I I I M I I II I I M I I II I Ml 1 1 III 1 1 1 1 I 1 1 I 1 1 I I I I II I 1 1 I I 1 1 II 1 1 I I I 1 1 1 
orf7ng MRGGRPDS\TTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

orf 7 FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 120 

55 M I II I I I I I I It II II M I I I I I I I I II I II I : I I I I I I I I I I I I I I I I I 1:11111 

orf7ng FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 

orf 7 HEAXXDHVASVFVNRI^IGMRMJTXXSVIYGMGAAYKGKIRKADIJIRDTPYNTYTRGGLP 180 

III I I I I I I I I I I 11 I I M I I I I I I I It I I I I ! I I I I I 1 II I I I I I I I I I I I I I I 
60 orf7ng HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYKTYTGGGLP 180 

orf7 PTPIALP 187 
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orf7ng UrUlPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYII^ 236 

An ORF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 

5 1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

10 Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
5C1 
551 
601 
651 
701 
751 
801 
851 



. taccgaatca 
ACTTGCcgaA 
CCTACGTTTT 
CCTTCGGAAG 
GCCGGATTCC 
TGAGGAAAGT 
TGGAGCAATG 
CAATCCTGAA 
GCAGCGATTT 
CTGAACGAGG 
TTATGAAATG 
AGGCCGACCG 
GGTATGCGCC 
ATACAAGGGC 
aCAccTAtac 
Aaggcggcaa 
tttcgtgtcC 

TGACCGAACA 



AGATTGCCAA 
GACCGCATCG 
GGGTGTGCAC 
TGTCTGCTTG 
GTTACCGTGC 
CATCGACGCA 
AAAAACTGAT 
GGGCAGTTTT 
GCAGATTTAC 
CATGGGCAGG 
CTGATTATGG 
CGACCATGTC 
TGCAAACCGA 
AAAATCCGTA 
gggcgggggc 
tggatgccgc 
AAAATGGACG 
CAACGCCGCc 



AAATCAGGGT 
TGTTCAGCAG 
AACAGGCTGC 
GGATATCTTG 
AGATTATCGA 
ACGCCCGACA 
GGCGGAAGTT 
TTCCCGACAG 
CAAACCGCCT 
CAGGCAGGAC 
CGAGCCTGAT 
GCTTCCGTCT 
CCCGTCCGTG 
AAGCCGACCT 
ttgccgccaa 
cgcccacccg 
GCACGGGCTT 
gTcCGCAAAT 



ATTTCGTCGG 
GCATGTTTTG 
ATACGGGGAC 
CAGAAAATGC 
AGGTTCGCGT 
TCGGACACGA 
GCGCCCGATG 
CTACGAAATC 
ACAAGGCGAT 
GGGCTGCCTT 
CGAAAAGGAA 
TCGTCAACCG 
ATTTACGGCA 
GCGCCGCGAC 
cccggattgc 
tccggcgaAa 
GAGCCAGTTC 
ATATTTTGAA 



TCGGCAGGAA 
ACAGCGGCGG 
gTACAGATTG 
GCGGCGGCAG 
TTTTCGCATA 
CACCAAAGGC 
CCTTCAGCGG 
GATGCGGGCG 
GCAACGCCGC 
ATAAAAACCC 
ACGGGGCATG 
CCTGAAAATC 
TGGGTGCGGC 
ACGCCGTACA 
gctgcccggC 
aatacctgTa 
AGCCATGATT 
AAAATAA 



This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng-l>: 



30 



35 



i 

51 
101 
151 
201 
251 



. YRIKIAKNQG 
PSEVSAWDIL 
WSNEKLMAEV 
LNEAWAGRQD 
GMRLQTDPSV 
KAAMDAAAHP 



ISSVGRKLAE 
QKMRGGRPDS 
APDAFSGNPE 
GLPYKNPYEM 
IYGMGAAYKG 
SGEKYLYFVS 



DRIVFSRHVL 
VTVQIIEGSR 
GQFFPDSYEI 
LIMASLIEKE 
KIRKADLRRD 
KMDGTGLSQF 



TAAAYVLGVH 
FSHMRKVIDA 
DAGGSDLQIY 
TGHEADRDHV 
TPYNTYTGGG 
SHDLTEHNAA 



NRLHTGTYRL 
TPDIGHDTKG 
QTAYKAMQRR 
ASVFVNRLKI 
LPPTRIALPG 
VRKYILKX* 



ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

orf7-l.pep KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

I I I I I I 1 I I I I I I I I I I i I I 1 II I I I I I II 
orf7ng-l YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

10 20 30 

70 80 90 100 110 120 

or f 7-1. pep TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
i I I I > I I ! I I I I i I I I I i I I I I I I I I I I I M I I I I I I I I 1 I I I I I I I I I I I 1 I I M I I 1 I 
orf7ng-l TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
40 50 60 70 80 90 

130 140 150 160 170 180 

orf7-l.pep T PDIGHDTKGWSNEKLMAE VAPDAFSGN PEGQFFPDS YEI DAGGS DLQI YQTAYKAMQRR 
I I II | | | I I I I I I I I I I I II ! I II I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I M I I 
orf7ng-l TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEI DAGGS DLQI YQTAYKAMQRR 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf7-l .pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
Mill : I I I I I I I I I I I I M I I I I I : I I I I I I I I I I II II I I I I I I I I I I I II I 1 I I I I 
orf7ng-l LNEAWAGRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
160 170 180 190 200 210 

250 260 270 280 290 300 

orf7-l.pep IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
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10 



15 



25 
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I | I I ill I || , i. , , H I I | I | | I I I I I I I I I I | | | | | | | |: I I I I I I M I I I I I I I t 
220 230 240 250 260 270 



310 320 330 

or f 7-1 oeD KMDGTGLSQFSHDLTEHNAAVRKYILKKX 

pp mm iiMiimin ii i in"'!' 

orf7na-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
* 280 290 



In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

splP28306IYCEG EC0LI HYPOTHETICAL 38.2 KD PROTEIN IN !^"^ 0 ™ GE J^ 3 of ^ 
gi 1787339 (AE000210) o340; 100% identical to fragment *CEG *CO LI SW P2B3U6 nut 
has 97 additional C-terminal residues [Escherichia colli Length - 340 
Icore ! 79 (36.2 bits,, Expect - o.Oe-5" J a- P<2> - 5.0e-57 
Identities = 20/87 (22%), Positives = 40/87 (45%) 

10 GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 69 
- t^Ti \i + + GTYR ++♦ ++L+ + G+ 



Query: 10 GISSVGRKLAtDRiviaKHVuiiut*"'—""""-"-* 
u y r ^r. -i. n+T+ V + + GTYR 

20 Sbjct: 49 



G ++G -L D+I + V + + GTYR +++ T UT 

GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 108 



Ouerv TO SVTVQIIEGSRFSHMRKVIDATPDIGH 96 
U y ' ++++EG R S K + P I H 

Sbjct: 109 QFPLRLVEGMRLSDYLKQLREAPYIKH 135 

Score = 438 (200.7 bits). Expect = 5.0e-57, Sum P(2) - 5.0e-57 
Identities = 84/155 (54%), Positives = 111/155 (71%) 



Query 

Sbjct: 158 



120 EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 179 
EG F+PD+* A +D+ + + A+K M + ++ AW GR DGLPYK+ -M-+ MAS+ ?^ 



m EG F+PD++ A +D+ + + A+K M + ++ AW GR Dfc>LfXR+ w n^T^u 

M ECTFWPDTWMYTANTTDVALLKRAHKKMVKAVDSAWEGRADGLPYKDKNQLVTMASIIEK 217 

239 



Ouerv 180 ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 
Query. ere VASVF+NRL+ IGMRLQTDP+VI YGMG Y GK + +ADL I WW 

35 Sbjct: 218 ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 277 

Ouerv 24 0 GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 274 

GLPP IA PG ♦+ AAAHP+ YLYFV+ G 
Sbjct: 278 G LPPGAI AT PGAD SLKAAAHPAKTP YLY FVADGKG 312 

40 

Based on this analysis, including the fact that the Kinfluenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 6 

45 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

50 201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

3U 251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

55 451 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 

1 RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKK 
51 " 'eRARLAAVGE RVNOlr-l'LLG GETALQKGQA GTALATYMLM LERTKSPEVA 
101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 
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IS\ HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

5 101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

10 351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

401 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

451 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

15 601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

7 01 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

7 51 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

20 851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 

901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

25 1101 AAAAGT AT CC GCGCCGGAAT ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

1301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

30 1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

1401 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

1451 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

35 1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

1701 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 

1751 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 

1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

40 This corresponds to the amino acid sequence <SEQ ID 42; ORP9-l>: 

1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AV5LNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

45 201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLKTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

351 MMYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

401 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 

50 4 51 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETLKR 

601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 

55 Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 
meningitidis: 
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orf9 pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
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orf 9a MLPARFTILSVLAAALLAGOAYAA — GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
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or f 9. pep 



120 130 I 40 150 160 

EMI YQKWRQI E PI PGKAQKRAGW LRNVLRERGNQHLDGREE VLAQADEGQ 
1 ti 1 1 1 t 1 I M I I I | I I It 1 I I < M I t I I 1 I I I I t I I I U I I M I I I 
orf9a EMIYQKTOQIEPIPGKAQKR^^ 

°" 120 130 I 40 150 160 170 

orf9a AAVQQIXSIAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQ 
180 190 200 210 220 230 

The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 

1 ATGTTACCCG CCCGTTTCAC CATTTTATCT GTGCTCGCGG CAGCCCTGCT 
51 TGCCGGGCAG GCGTATGCCG CCGGCGCGGC GGATGCGAAG CCGCCGAAGG 
101 AAGTCGGAAA GGTTTTCAGA AAGCAGCAGC GTTACAGCGA GGAAGAAATC 
151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAGCGGG TTAATCAGAT 
201 ATTTACGTTG CTGGGANGGG AAACCGCCTT GCAAAAGGGG CAGGCGGGAA 
251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 
301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCNCTGAACG CGTTTGAACA 
351 GGCGGAAATG ATTTATCAGA AATGGCGGCA GATTGAGCCT ATACCGGGTA 
401 AGGCGCAAAA ACGGGCGGGG TGGCTGCGGA ACGTGCTGAG GGAAAGAGGA 
451 AATCAGCATC TAGACGGACT GGAAGAANTG CTGGCTCAGG CGGACGAANG 
501 ACAGAACCGC AGGGTGTTTT TATTGTTGGC ACAAGCCGCC GTGCAACAGG 
551 ACGGGTTGGC GCAAAAAGCA TCGAAAGCGG TTCGCCGCGC GGCGTTGAGA 
601 TATGAACATC TGCCCGAAGC GGCGGTTGCC GATGTGGTGT TCAGCGTACA 
€51 GGNACGCGAA AAGGAAAAGG CAATCGGAGC TTTGCAGCGT TTGGCGAAGC 
701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG ^CTGACTGCA 
7 SI CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 
801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 
851 TGCACAGGCT GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACGC 
901 AATCCGAATG CAGACCTGTA TATTCAGGCA GCGATATTGG CGGCAAACCG 
951 AAAAGAANGT GCTTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 
1001 GGGGGACGGG GGAACAGCGG GGCAGGGCGG CAATGACGGC GGCGATGATA 
1051 TATGCCGACC GAAGGGATTA CACCAAAGTC AGGCAGTGGT TGAAAAAAGT 
1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG TGTGCTGGCG GCTGCGGCGG 
1151 CTGTCGAGTT GGACNGCGGC AGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 
1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 
1251 CAAAATACAG ATGTTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAGGCTT 
1301 TGAGGGGGTT GGACAAGATT ATCGAAAAAC CGCCTGCCGG CAGTAATACA 
1351 GAGTTACAGG CAGAGGCATT GGTACAGCGG TCAGTTGTTT ACGATCGGCT 
1401 TGGCAAGCGG AAAAAAATGA TTTCAGATCT TGAAAGGGCG TTCAGGCTTG 
1451 CACCCGATAA CGCTCAGATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 
1501 GATTCCAAAC GTTTGGACGA AGGCTTCGCC CTGCTTCAGA CGGCATACCA 
1551 AATCAACCCG GACGATACCG CTGTCAACGA CAGCATAGGC TGGGCGTATT 
1601 ACCTGAAANG CGACGCGGAA AGCGCGCTGC CGTATCTGCG GTATTCGTTT 
1651 GAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
17 51 ACCTTACGGG AGACAAGAAA ATATGGCGGG AAACGCTCAA ACGTCACGGC 
1801 ATCGCATTGC CCCAACCTTC CCGAAAACCT CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 44>: 

1 ML PAR FT I LS V IAAALLAGQ AYAAGA ADAK PPKEVGKVFR KQQRYSEEEI 

51 KNERARLAAV GERVNQIFTL LGXETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM I YQKWRQI EP IPGKAQKRAG WLRNVLRERG 

151 NQHLDGLEEX LAQADEXQNR RVFLLLAQAA VQQDGLAQKA SKAVRRAALR 

201 YEHLPEAAVA DWFSVQXRE KEKAIGALQR LAKLDTEILP PTLMTLRLTA 

251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLHRLDDA YARLNVLLER 

301 NPRADLYIQA AILAANRKEX ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

351 YADRRDYTKV RQWLKKVSAP EYLFDKGVLA AAAAVELDXG RAALRQIGRV 

401 RKLPEQQGRY FTADNLSKIQ MFALSKLPDK REALRGLDKI IEKPPAGSNT 

451 ELQAEALVQR SWYDRLGKR KKMISDLERA FRLAPDNAQI MNNLGYSLLS 

501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKXDAE SALPYLRYSF 

551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLTGDKK IWRETLKRHG 
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601 IALPQPSRKP RK* 

ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 

10 20 30 40 50 

orf9a oeo MLPARFTILSVIAAALIAGQAYAAG--AADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 

or f 9- 1 mlpNRFKMLTVLTATLI AGQVSAAGGGAGDMKQPKEVGKVFRKQQRVSEEEIKNERARLA 

10 20 30 40 50 60 

60 70 80 90 100 

orf9a Deo AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
orf pep imilllllllll I I i 1 1 1 1 1 I I I I I 1 I I I I 1 I I t I I I t I I I I 1 I t 1 1 I 1 1 1 I I I I I J 
o-f9-l A VGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLKAFEQA 
70 80 90 100 HO 120 

120 130 140 150 160 1"?0 

orf 9a Dep EM IYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 

I M II 1 1 1 K I I 1 1 > 1 1 1 1 1 1 M 1 1 1 1 1 M i 1 1 1 K I I 1 1 I 1 I MUM M 1 1 M M M I 
EMI YGKWRQIEPI PGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
130 140 150 160 170 180 



orf9-l 



180 190 200 210 220 230 

AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
UN! mMMMIMMMMMIMMMIMMM i I I I II I I 1 i M I M 1 M I I 
nr f C .i AAVQODGLAQKASKAVRRAALKYEHLPEAAVADW FSVQGREKEKAIGALQRLAKLDTEI 

° 190 200 210 220 230 240 



orf 9a. pep 



orf 9a .pep 
orf9-l 

orf 9a. pep 



240 250 260 270 280 290 

LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 

HIM 111 II II limilMIIMIMMMMMIMMMMIIMMMM 

LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 

300 310 320 330 340 350 

ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 

I I I I I 1 | I M | I I I 1 I I I 1 1 1 I i I I 1 I i II I II I M I I M : M I : I II I : M M I M : 



nr f c- 1 ERN PN ADLYI QAAILAANRKEGASV I DG YAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

31 0 320 330 340 350 360 

360 370 380 390 400 410 

KVRQWIJCKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 

IIIIIMMIMIMMMMIIIMMII M II M M 1 II I M M Ml I I M II I I \ I 
KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

370 380 390 400 410 420 



orf 9a. pep 
orf9-l 



420 430 440 450 460 470 

orf 9a oeo tqmfalSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
P P III: 1 1 1 1 I M M 1 1 M I M II M I II M M I I M M I II I 1 1 M M M M M M I M M 
orf 9-1 iqmlALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 

480 490 500 510 520 530 

RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

I ! 1 1 I 1 I I I f t 1 I I 1 1 I I I I I : 1 I I I I I I I M M M I I I I I I I I t M I I I I I t t I M I I 
.. A ft A iot ow^t t mnevoT nrrrat t n^avnTMPnnTJVVNriSTCTIAYYLKGD 



orf 9a .pep 



orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVKDSIGWAYYLKGD 
490 500 510 520 530 540 

540 550 560 570 580 590 

orfQa oeo AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
" P F 1 1 1 1 | | 1 1 1 | | | | 1 1 1 1 1 1 1 1 1 1 1 I M I I II I M I II M I I I I I II M I II 1 1 M M M I 
orf 9-1 AESALPYLRYS FENDPEPEVAAH W3EVLW ALGER DQAVDVWTQAAHLTGDKKIWRET LKR 

550 560 570 580 590 600 

600 610 
o-f 9a . pep HGIALPQPSRKPRKX 
! 1 1 1 1 1 1 1 I ! 1 1 1 M 
orf 9-1 HGIALPQPSRKPRKX 
610 
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Homnlnqv with * eredicted OP p from NiSSSOfrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 
gonorrhoeae: 

o rf9 RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 54 

5 ° r ,, : ,:||:l:l:ltl: II ||:|:: 1 1 1 1 II 1 : 1 1 :: I I I I I I M 1 1 1 I J 

orf9ng MIMLPARFTILSVLAAALLAGQAYAA — GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 58 

or f 9 LAAVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFE 1 1 4 

M||lllll::MlMMillll1lliltMlMIIMHIIItlllllllM II 

1 0 or f 9ng LAAVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFE 1 1 8 

or f 9 QAEMI YQKWRQI E P I PGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 166 

I II I II III Ml Mill: III II I II III: I M HI Hlll:l 
orf 9ng QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPKLDRLEEVPAQSDYVHQPMIFLLL 178 

1 5 The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EM I YQKWRQI EPIPGEAQKP AGWLRNVLKE 

20 151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 

25 Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

30 201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

4 01 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 

35 451 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 

501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 

551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 

601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 

651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 

40 701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 

901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 

45 951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

50 1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

1401 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 

55 1451 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 



WSDOCID: <WO 9924578A2_L> 



WO 99/24578 



-83- 



PCT/IB98/01665 



1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
1*751 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 



This encodes a protein having amino acid sequence <SEQ ID 48>: 



1 MT.PARFTILS VLAAAUAGQ 



10 



15 



51 KNERAR1AAV 

101 VAERALEMAV 

151 NQHLDGLKEV 

201 YEHLPEAAVA 

251 RKYPEILDGF 

301 NPNANLYIQA 

351 YADRRDYAKV 

4 01 RKLPEQQGRY 

4 51 EPLAEALAQR 

501 DSKRLDEGFA 

551 ENDPEPEVAA 

601 IALPEPSRKP 



GERVNRVFTL 
SLNAFEOAEM 
LAQSDDVQKR 
DAVFGVQGRE 
FEQTDTQNLS 
AILAANRKEG 
RQWLKKVSAF 
FTADNLSKIQ 
SIIYEQFGKR 
LLQTAYQINP 
HLGEVLWALG 
RK* 



AYAAGA ADVE 
LGGETALQKG 
IYQKWRQIEP 
RIFLLLVQAA 
KEKAIEALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MLALSKLPDK 
GKMIADLETA 
DDTAVNDSIG 
ERDQAVDVWT 



LPKEVGKVLR 
QAGTALATYM 
IPGEAQKPAG 
VQQGGVAQKA 
LAKLDTEILP 
LVSLRKPDDA 
AYGRGTGEQR 
AAAAAELDGG 
REALIGLNNI 
LKLTPDNAQI 
WAYYLKGDAE 
QAAHLRGDKK 



KHRRYSEEEI 
LMLERTKSPE 
WLRNVLKEGG 
SKAVRRAALK 
PTLMTLRLTA 
YARLNVLLEH 
GRAAMTAAMI 
RAALRQIGRV 
IAKLSAAGST 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRYG 



ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 
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or f 9-1 Pep MLPraFT<MLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
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orf9nq-l M L PAR FT I L S VLAAALLAG Q AY AAG — AAD VE L PKE VGKV LRK HRR Y S E E E I KNERAR LA 

10 20 30 40 50 

70 80 90 100 110 120 

orf 9-1 pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I | | I | | | :: I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I 1 I M I I I II I I I 
or f 9nq - 1 AVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 "70 80 90 100 110 

130 140 150 160 1*70 180 

orf 9- 1 oep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
| U || | | | | | || | II: I I I (1111111:1 11111111:11111:1: 1:11:1111=1 
orf9nq-l EMIYQKWROIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 
^" " 120 130 140 150 160 170 

190 200 210 220 230 240 

or f 9-i Pep AAVQQDGIAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRI^KLDTEI 

* p p mil i = 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 = 1 1 = 1 1 1 1 1 1 1 1 1 1 illinium 

orf 9na- 1 AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 
1BO 190 200 210 220 230 

250 260 270 280 290 300 

orf 9-1 pep LPPTI^I^LTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 

imiimiiiiiMitiiminiMinimMimnii:: itiiiMiin 

orf9na-l LPPTU*4TLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 
y 240 250 260 270 280 290 

310 320 330 340 350 360 

orf 9-1 pep ERNPNADLYIQAAIIAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMKYADRRDYA 
|:| | | 1 : 1 1 1 1 1 1 1 1 1 1 I 1 1 I I I 1 1 1 1 I I I I I 1 1 1 1 1 1 lll:lll:llll:IIIIH II 
orf9nq-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
" ^ " 300 310 320 330 340 350 

3*70 380 390 400 410 420 

orf9-l pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
| | | | | | | | | | | | | II I I I I I I I M I I : I I II I 1 I I I M I I M I I I I I 1 I II I I I I I M I I 
orf9nq-l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
360 370 380 390 400 410 

430 440 450 460 470 480 

orf9-l pep IQMIJ^SKLPDKREAIAGLDKIIEKPPAGSNTEW^ 

MINI Mill I l::ll I l:::ll I II I : I I I : : I : : : I I I 111:111 

orf9nq-l IQMU^SKLPDKREALIGI^NIIAKLSAAGSTEPIAEALAQRSIIYEQFGKRGKMIADLE 
420 430 440 450 460 470 



490 



500 



510 



520 



530 



540 



BNSOOCIO: *WO_99Z457BA2J_> 
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-84- 

orf9-l pep RAFRIAPDNAQI^LGYSLLTDSKIUDEGFALLQTA^^ 

r^*«n fl -l TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
° rryng 480 490 500 510 520 530 

550 560 570 580 590 600 

rwDj/^.ir /lrysfendpepevaahlgevlwalgerdqavdvwtqaahltgdkkiwretlkr 
lltllltlSlllllllttlltlllllltllllllillltSlllllll mm miim 



550 560 3ou oy\j 

orf 9-1 oep aesalpylrysfendpepevaahwevlwalgerdqavd^qaahltgdkkiwretl^ 
orf9 l.pep 1 1 1 1 1 1 1 1 1 1 1 i 1 1 i i 1 1 I 1 1 1 1 I M 1 1 I 1 1 1 1 I 1 1 M M I I I 1 1 M 1 1 1 II I 1 1 1 1 M 
ftrf q nfl i aesalpylrysfendpepevaahlgevlwalgerdqavdvwtqaahlrgdkkiwretlkb 

orisng ^ ^ 57Q 58Q 59Q 



610 

orf 9-1 .pep HGIALPQPSRKPRKX 
: I I I i I : I I I I I I 1 I 
15 orf9ng-l YGIALPEPSRKPRKX 

600 610 



In addition, ORF9ng shows significant homology with a hypothetical protein from ^aeruginosa: 

sp|P42810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 

>gi[l072999lpirl|S49376 hypothetical protein 3 - Pseudomonas aeruginosa >gil557259 
(X82071) orf 3 (Pseudomonas aeruginosa] Length =576 

Identities' SS/sJ? 1 U3%?rPosiui:s 2 ! 228/587 (38%), Gaps - 125/587 (21%) 



60 



95 Ouerv 67 VFTLLGGETALQKGQAGT ALAT YMLMLERTKS PEVAERALEMAV S LNAFEQAEMI YQKWR 126 

*' +++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A L A ++A W 

Sbjct: 53 LYSLLVAELAGQRNRFDIALSNYWQAQKTRDPGVSERAFRIAEYLGADQEALDTSLLWA 112 

Ouerv 127 QIEPI PGEAQKPAG WLRNVLKEGGNOHLDGLKEVLAQSDDVQKRRI 172 

+ P +AQ+ A ++ VL G+ H D L A++D ♦ + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Ouerv 173 FXXXXXXXXXXXXXXXKASKAVRRAAIJCYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 232 
U y * ++ KY + + A+ Q ++A+ L+ + 

35 Sbjct: 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 

Ouerv 233 KLDTEILPPTLMTLRLTARK YPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKP 287 

y * E+PL+L + K P+GED + ++ + LV + 
Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 

Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y++ + 

Sbjct: 271 DDAKAEFAGLVQQFPDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 

45 Ouerv 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 

Query. ^ +K+ A +D y A+ G G + T ++ A R D A R + P+ 

Sbjct: 331 RLAEEQKDT ARALDE YAQ — VG PGN DFLPAQLRQT DVLLKAGRVDEAAQRLDKARSEQPD 388 

Query 372 YLFDKXXXXXXXXXXXXXXXXXXRQIGRVRKLPEQQGRYFTADNLSKIQMLALSKLPDKR 431 
ca Y A L 1+ ALS + 

Sbjct: 389 Y AIQLYLIEAEALSNNDQQE 408 

Ouerv 432 EALIGLNNIIAKLSAAGSTEPLAEALAQRS 1 1 YEQFGKRGKMIADLETALKLTPDNAQIM 491 
yuery. +ft ^ + + + £ L L RS++ E+ +M DL + PDNA + 

55 Sbjct: 409 KAWQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 462 

Query 492 NNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 

N LGY+L + R E L+ A+++NPDD A+ DS+GW Y +G A YLR + + 
Sbjct: 463 NALGYTLADRTTRYGEAREL I LKAHKLN P DDPAI LDSMGW I N YRQGKLADAERY LRQALQ 522 

Query 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

P+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 
Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 

65 gi | 2983399 (AE000710) hypothetical protein {Aquifex aeolicus] Length - 545 

Score = 81.5 bits (198), Expect - le-14 , ftav 
Identities « 61/198 (30%), Positives - 98/198 (48%), Gaps - 19/198 (9%) 

Query 408 GRYFTADNL-SKIQ^CALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEIALAQ 459 

70 G Y A L K ++LA PDK+E L ♦ +K + + L + 



BMSDCC1D: <WO_GS2457aA2_1_> 
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+ I+Y+ G L A++L P+N N LGYSLL +R++E L+ + 

VYFMEAIVYDNWSDIKNAEKM-RKAIELDPENPDYYNYLGYSLLLWYGKERV^ 

TAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSF-ENDPEPEVAAHLGEVLWALGER 

A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 
KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 

DQAVDVWTQAAHLRGDKK 590 
++A + + +A L + K 



Sbjct: 


335 


Ouery: 


460 


Sbjct: 


391 


Query: 


514 


Sbjct: 


451 


Query: 


573 


Sbjct: 


511 



15 Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 7 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

20 51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

25 301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CGaCTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

4 51 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

30 551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

701 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 

35 i . .NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNVJG 

51 W AIIVLTIIV KAVLYPLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 

101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 

151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 

201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEWS * 

40 Further sequence analysis revealed the complete DNA sequence <SEQ ID 51>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

45 201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

401 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

50 451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

55 701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

60 951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ACTGGTTCGC 
ATCGGCAACT 
CGTACTGTAT 
GTGCCGCCGC 
CGTATGGCGC 
CAACCCGCTG 
TCGGATTGTA 
TGGCTGGGTT 
GCCCATCATT 
CGCCGACCGA 
TTCTCCGTCA 
AGTCAACAAC 
TCGAAAAACA 



CTCCCCGCTC 
GGGGCTGGGC 
CCATTGACCA 
ACCCAAACTG 
AACAACAGGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
TGTTCTTCTT 
CTCCTGACCA 
ACGCGCCCAA 



TTCTGGCTCC 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
GATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



TGAACCAACT 
TTAACCATCA 
CCGCTCTATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCCT 
CCAAACTTAT 
TGAAAATCAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCGTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 
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20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 



APATPITVTT 
YTYVAQSELL 
IDKVYTFTKG 
HSYVGPWYT 
HFMSTWILQP 
AEASINLYAG 
IGNWGWAIIV 



DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGRQSVCAAG 
PQTTSVIANI 
LTI IVKAVLY 



RMA0Q0AMMQ 
WLGWITDLSR 
FSVMFFFFPA 



LYTDEKINPL 
ADPYYILPII 
GLVLYWWNN 



SGDLRRLTLL 
IGFSAPKKQY 
ANGSGQTANL 
SDLDDDAKSG 
ECNIDIKRRN 
ADNLQLAKDY 
PLTN ASYRSM 
GGCL PMLLQI 



KYKATGDENK 
SLEGDKVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSTSVSV 
GKVHWFASPL 
AKMRAAAPKL 
PVFIGLYWAL 



MAATMFAQTY 
LLTIAQQWHI 



LNPPPTDPMQ 
NRSIEKQRAQ 



PFILFGDGKE 
LSAPETRGLK 
SEPEGQGYFT 
PTGWLGMIEH 
PLAAIQNGAK 
FWLLNQLHNI 
QAIKEKYGDD 
FASVELRQAP 
AKMMKIMPLV 
GEWS* 



30 



Computer analysis of this amino acid sequence gave the following results: 

Homology with a 60kDa inner-membrane protein (accession P25754^ of Pseudomonas putida 
ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 

ORFll 2 



60K 



LYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLTIIVK 61 
LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 
324 LYAGPKIQSKLKELSPGLELTVDYGFLWFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIK 383 



ORFll 62 AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 

+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 

60K 384 GLFFPLSAASYRSMARMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPI 443 

ORFll 122 LLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPT 181 

L+0+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 
60K 444 LV^IPVFLALYWVLLESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPP 503 

ORFll 182 DPMQAKMMKIMPLVXXXXXXXXPAGXVLYWWNNLLTIAQQWHINRSIE 230 

DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

60K 504 DPMQAKVMKMMPIIFTFFFLWFPAGLVLYWVVNNCLSISQQWYITRRIE 552 

45 Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF1 1 shows 97.9% identity over a 240aa overlap with an ORF (ORF 1 la) from strain A of N. 



35 



40 



50 



55 



meningitidis: 

orfll.pep 
orflla 

orfll.pep 
orflla 



10 20 30 

NLYAGPQTTSVIANIADNLQLAKDYGKVHW 
I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 

40 50 60 70 80 90 

FASPLFWLLNQLHNI IGNWGWAI IVLT 1 1 VKAVLYPLTNAS YRSMAKMRAAAPKLQAIKE 
1 I 1 I I I 1 1 1 I I I 1 I 1 S 1 I I I I 1 I I I I t I I I t I 1 I I I ! I I t I I I I I I ! 1 1 I ! I I t I I I 1 I ! 
FAS PLFWLLNQLHNIIGNWGWAIIVLT 1 1 VKAVLYPLTNAS YRSMAKMRAAAPKLQAIKE 
340 350 360 370 380 390 



8NS0OCID: <WO__9»4578A2J_> 
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15 



WO 99/24578 

orfll.pep 
orflla 

orfll.pep 
orflla 

orfll.pep 
orflla 



PCT/IB98/01665 



-87- 



10 o HO 120 13° 140 150 

KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 

MI{illMlllliliin llil,i lM!Mllllllllttillllil|||||Mllin 
KYGDDRMAQQQAMMQLYTD2KINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 

400 410 4 20 430 440 4 50 

160 170 180 190 200 210 

TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 inn mi ui 

TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 
460 470 4 80 490 500 510 

220 230 240 

WWNN LLT I AQQWHINRS I EKQRAQGE WSX 

t I t | | | | t I I I 1 I 1 I I 1 I I I I I I I I 1 ■ 1 I I 1 
WVINNLLTIAQQWHINRS I EKQRAQGE WSX 

520 530 540 



The complete length ORF1 la nucleotide sequence <SEQ ID 53> is: 
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25 



30 



35 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ANGGATTTTA 
GATCGGATNG 
AACAGACGGC 
GCGCCCGNAN 
TGATGAAAAA 
CAACCGGCGA 
TACACCTACN 
TCTAAAAGGC 
GCGACAAAGT 
ATCGACAAAG 
CTTCGACATC 
ACCGCATCGT 
CACTCTTACG 
AGTCAGCTTC 
CCGAATACAT 
CACTTCATGT 
CGCCGCTGGC 
ACAGCACCAG 
TCCNAAGCCT 
CGCAAACATC 
ACTGGTTCGC 
ATCGGCAACT 
CGTACTGTAT 
GTGCCGCCGC 
CGTATGGCGC 
CAACCCGCTG 
TCGGATTGTA 
TGGCTGGGTT 
GCCCATCATT 
CGCCGACCGA 
NTNTCNNNNA 
GATCAACAAC 
TCGAAAAACA 



AAAGACTCAC 
NAAANGATGT 
ACAACAACAG 
CGCCGATTAC 
AGCGGCGACC 
CNAAAATAAA 
TCGCCCANTC 
ATCGGCTTTA 
TGAAGTCCGC 
TTTATACTTT 
GCCAACGGCA 
CCGCGACCAC 
TCGGCCCTGT 
TCCGACTTGG 
CCGCAAAACC 
CCACCTGGAT 
GACTGCNGTA 
CGTCAGCGTG 
CCATCAACCT 
GCCGACAACC 
CTCCCCCCTC 
GGGGCTGGGC 
CCATTGACCA 
GCCCAAACTG 
AGCAACAAGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
NGTTCTTCNK 
CTCCTGACCA 
ACGCGCCCAA 



NGNGTTTTTC 
TCCCCACTCC 
GCCGTAANCG 
CGTAACGACC 
TGCGCCGGCT 
CCGTTCATCC 
CGAACTTTTG 
GCGCACCGAA 
CTGAGCGCAC 
CACCAAAGGC 
GCGGTCAAAC 
AGCGAACCCG 
TGTTTATACC 
ACGACGATGC 
CNGACCGGCT 
CCTCCAACCC 
TNGACATCAA 
CCTTTAGCCG 
CTACGCCGGC 
TGCAACTGGN 
TTTTGGCTTT 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
CATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



GCCATCGCAC 
GAAGCCCGTC 

crrccGCCGA 

GACACGGTTC 
GACCCTGCTC 
TGTTTGGCGA 
GACGCGCAGG 
AAAACAGTAC 
CTGAAACACG 
AGCTATCTGG 
CGCCAACCTG 
AGGGTCAAGG 
CCTGAAGGCA 
CAANTCCGGN 
GGCTCGGCAT 
AAAGGCGGAC 
ACGCCGCAAC 
CTATCCAAAA 
CCACAGACCA 
CAAAGACTAC 
TGAACCAACT 
TTAACCATCA 
CCGTTCGATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCNT 
CCAAACCTAT 
TGAAAATCAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



TGGTGATTAT 
CCCGCGCCCC 
AGCCGCGCTC 
AAGCCGTCAT 
AAATACAAAG 
CGGCAAANAA 
GCAACAACAT 
AGCTTGGAAG 
CGGTCTGAAA 
TCAACGTCCG 
AGCGCGGACT 
CTACTTTACC 
ACTTCCAAAA 
AAATCCGAGG 
GATTGAACAC 
AAAGCGTTTG 
GACAAGCTGT 
CGGTGCGAAA 
CATCNGTTAT 
GGCAAAGTAC 
GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCTTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This encodes a protein having amino acid sequence <SEQ ID 54>: 
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60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 



APXXPITVTT 
YTYXAXSELL 
IDKVYTFTKG 
HSYVGPWYT 
HFMSTWILQP 
SXASINLYAG 
IGNWGWAIIV 



DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGGQSVCAAG 
PQTTSVIANI 
LTIIVKAVLY 



RMAQQQAMMQ 
WLGWITDLSR 
XSXXFFXFPA 



LYTDEKINPL 
ADPYYILPII 
GLVLYWVINN 



SGDLRRLTLL 
IGFSAPKKQY 
ANGSGQTANL 
SDLDDDAXSG 
DCXXDIKRRN 
ADNLQLXKDY 
PLTHASYRSM 
GGCLPMLLQI 



KYKATGDXNK 
SLEGDKVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSTSVSV 
GKVHWFASPL 
AKMRAAAPKL 
PVFIGLYWAL 



MAATMFAQTY 
LLTIAQQWHI 



LNPPPTDPMQ 
NRSIEKQRAQ 



PFILFGDGKX 
LSAPETRGLK 
SEPEGQGYFT 
XTGWLGMIEH 
PLAAIQNGAK 
FWLLNQLHNI 
QAIKEKYGDD 
FASVELRQAP 
AKMMKIMPLV 
GEWS* 



ORF1 la and ORF11-1 show 95.2% identity in 544 aa overlap: 
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orflla oep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 

P y WWW Mllllllltl I II I I I I I I II ||:t I I I I I* m I MM I 

orfll-1 MDFKRLTAFFA1ALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 SO 60 

70 80 90 100 HO 120 

orflla oep DTVQAV I DEKSGDLRRLTLLKYKATGDXNKPFI LFG DGKXYT YXAXSELLDAQGNN I LKG 

ti m ii iniiM I ii i n 1 1 1 inn immmi m i immtmm 

orfll-1 DTVQAVI DEKSGDLRRLTLLKYKATGDENKPFI LFGDGKEYTYVAQSELLDAQGNNILKG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orflla pep IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

* c i mi mi ii ii ii mi urn mil iii urn inn mil i ii urn iiiimm 

orfll-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orflla . pep S ADYRI VRDHSE PEGQGYFTHS YVGP WYT PEGN FQKVS FS DLDDDAXSGKSEAE Y I RKT 

* * I | Ml I ill I III I 1 1 I III I I III Ml I III III I I I I Ml I I I I I I I M M III III 

orfll-1 SADYRIVRDHSEPEGQGYFTHS YVGPWYT PEGN FQKVS FSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 

250 260 270 280 290 300 

orflla . pep XTGWLGMIEHH FMSTW I LQPKGGQSVCAAG DCXXD I KRRNDKLYS TSVSVPLAAIQNGAK 

lllllllllllllllllllll 1111111:1 Ml Mill llllll Ml MM Mill 
orfll-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIIOIRNDKLYSTSVSVPLAAIQNGAK 

250 260 270 280 290 300 

310 320 330 340 350 360 

orflla . pep SXASINLYAGPQTTSVIANIADNLQIJCKDYGKVHWFASPLFWIJ^QLHNIIGNWGWAIIV 
: || 1 1| III Ml M II I M Ml M I Ml II II I II Ml I I I I M M M II II M M I I 
orfll-1 AEASINLYAGPQTTSVIANIADNLQIJ^CDYGKVHWFASPLFWIJ^QIJiNIIGlWGWAIIV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orflla . pep LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 
I I I I 1 1 I I I I III I Ml I II II I Ml II M M Ml II I Ml I M II Ml I M II M Ml I 
orfll-1 LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orflla . pep GGCLW4LLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 
M II I I M 1 1 M M I II I I M M M II M II M M I M M M I I II II M I I II I M M I 
orfll-1 GGCLPMLLQI PVFIGLYWALFASVELRQAPWLGW I TDLSRADP YYI LPI IMAATM FAQTY 

430 440 450 460 470 480 

490 500 510 520 530 540 

orflla. pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLTIAQQWHINRSIEKQRAQ 
| || I || It III MM Mill I M Ml M I II II : I M M II II II I M M M Ml I 
orfll-1 i^pppTDPMQAKtMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRAQ 

490 500 510 520 530 540 



orflla. pep GEWSX 

mm 

orfll-1 GEWSX 



Homoloev with a predicted ORF from A Gonorrhoeae 

ORF1 1 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORF1 1 .ng) from N. 
gonorrhoeae: 

Orf 1 1 NLYAGPQTTSVIANIADNLQLAKDYGKVHWFAS PLFWLLNQLHKI IGNWGWAI IVLT 57 

1 1 1 I II I II M M 1 1 I M M II M M I II II I M I M I M M II II M I M I M M I 
orf ling MAVNLYAGPQTTSVIANIADNI^UUCDYGKVHWFASPLFWLLNQLHNIIGNWGW 60 
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or f 11 I IVKAVLYP LTN AS YRSMAKMRAAAPKLQ AI KEKYG DDRMAQQQAMMQLYT DEKI N PLGG 117 
I llll I 111 MM HI lit I |: 11:111111 I I Ml I I II till: 11:1 Mill 

orfimg iivkavlypltnasyrsmakmraaapelqtikekygddr^^ 120 

5 . or fH CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLK 177 

I 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 ! | i 1 | M I 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 I I Mill 
orfllng CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 

or fli PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWVVNNLLTIAQQWHINRSIEKORAQGE 237 

10 | I | | | | | |l III I I I III I I M I I M I M II I MM I II I I I I I M II I I M M II I 

orfllng PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGE 240 

orfll WS 240 

I I t 

15 orfllng WS 243 

An ORF1 Ing nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 

1 MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWGW AIWLT IIVKAVLYPL TN ASYRSMAK MRAAAPELQT IKEKYGDDRM 

20 101 AQQQAMMQLF EDEEINPLGG CLP MLLQIPV FIGLYWALFA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMP LVFS 

201 VMFFFFPAGL VLYW WNNLL TIAQQWHINR SIEKQRAQGE WS* 

Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

25 51 GATCGGCTGG GAAAAAATGT TCCCCACCCC GAAACCCGTC CCCGCGCCCC 

101 AACAGGCGGC ACAAAAACAG GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTTAT 

201 TGATGAAAAA AGTGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAACAAA CCGTTCGTCC TGTTTGGCGA CGGCAAAGAA 

30 301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTGAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC ACCCTCAACG 

401 GCGACACAGT CGAAGTCCGC CTGAGCGCGC CCGAAACCAA CGGACTGAAA 

4 51 ATCGACAAAG TCTATACCTT TACCAAAGAC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

35 551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCgacTTgg acgACGATGC gaaaTccggc aaATccgagg 

701 ccgaatacaT CCGCAAAACC ccgaccggtt ggctcggcat gattgaacac 

751 cacttcatgt ccacctggat cctccAAcct aaaggcggcc aaaacgtttg 

40 801 cgcccaggga gactgccgta tcgacattaa aCgccgcaac gacaagctgt 

851 acagcgcaag cgtcagcgtg cctttaaccg ctatcccaac ccgggggcca 

901 aaaccgaaaa tggcggTCAA CCTGTATGCC GGTCCGCAAA CCACATCCGT 

951 TATCGCAAAC ATCGCcgacA ACCTGCAACT GGCAAAAGAC TACGGTAAAG 

1001 TACACTGGTT CGCATCGCCG CTCTTCTGGC TCCTGAACCA ACTGCACAAC 

45 1051 ATTATCGGCA ACTGGGGCTG GGCAATCGTC GTTTTGACCA TCATCGTCAA 

1101 AGCCGTACTG TATCCATTGA CCAACGcctC ctACCGTTCG ATGGCGAAAA 

1151 TGCGTGccgc cgcacCcaaA CTGCAGACCA TCAAAGAAAA ATAcgGCGAC 

1201 GACCGTATGG CGCAACAGCA AGCGATGATG CAGCTTTACA AAgacgAGAA 

1251 AATCAACCCG CTGGGCGGCT GTctgcctat gctgttgCAA ATCCCCGTCT 

50 1301 TCATCGGCTT GTACTGGGCA TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 

1351 CCTTGGCTGG GCTGGATTAC CGACCTCAGC CGCGCCGACC CCTACTACAT 

1401 CCTGCCCATC ATTATGGCGG CAACGATGTT CGCCCAAACC TATCTGAACC 

1451 CGCCGCCGAC CGACCCGATG CAGGCGAAAA TGATGAAAAT CATGCCGTTG 

1501 GTTTTCTCCG TCATGTTCTT CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 

55 1551 GGTGGTCAAC AACCTCCTGA CCATCGCCCA GCAGTGGCAC ATCAACCGCA 

1601 GCATCGAAAA ACAACGCGCC CAAGGCGAAG TCGTTTCCTA A 

This encodes a protein having amino acid sequence <SEQ ID 58; ORF1 lng-l>: 

1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

60 101 YTYVAQSELL DAQGNKILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKWAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 
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351 TTr.MMfflJ AIV VLTIIVKAVL YPLT NASYRS MAKMRAAAPK LQTIKEKYGD 
4 01 DRMAQQQAMM QT.YKnP.yTMp t ^ Ti omt/t,0 IPVFIGLYWA LFASVELRQA 
4 51 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 
501 vprvmfFFFP AGLVLYW WN NLLTIAQQWH INRSIEKQRA QGEWS* 

5 ORF1 lng-1 and ORF1 1-1 shown 95.1% identity in 546 aa overlap: 

10 20 30 40 50 60 

orfllnq-1 Dep MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQKQAATASAEAALAPATPITVTT 

orf 1 1- 1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 
jO 10 20 30 40 50 60 

70 80 90 100 HO 120 

orfllna-1 Pep DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 
9 ' W 1 1 1 j 1 1 | 1 1 1 1 M I M 1 1 M I 1 1 I I I 1 1 1 1 M 1 1 M M I I 1 1 M 1 1 M 1 1 M I 1 1 1 1 M t 

1 5 or*ll-l DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

lJ no 80 90 100 110 120 

130 140 150 160 170 180 

orfllnq-1. pep IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 

20 mi iiini:i:ii inn mi i m mi in n i n 1 1 1 n 1 1 n 1 1 1 1 1 1 1 1 

orf 11-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYIjVNVRFDIANGSGQTANL 

130 140 150 160 170 180 

190 200 210 220 230 240 

25 orfllna-1 Pep sadyrivrdhsepegqgyfthsyvgpwytpegnfqkvsfsdldddaksgkseaeyirkt 

g ' H 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II II II I II II 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 II I II 1 1 

orf ii-i sadyrivrdhsepegqgyfthsyvgpwytpegnfqkvsfsdldddaksgkseaeyirkt 

190 200 210 220 230 240 

on 250 260 270 280 290 300 

orfllna-1 .pep ptgwlgmiehhfkstwilqpkggqwcaqgdcridikrrndklysasvsvpltaiptrgp 

I I | I I 1 I I I I I I I I I I li 11 I I l:MI 1:1 I I I I I I I I I I I I : I I I II I : II : I 

orf 11-1 ptgwlgmiehhfmstwilqpkgrqsvcaagecnidikrrndklystsvsvplaaiqn-ga 

250 260 270 280 290 

15 

310 320 330 340 350 360 

orfllna-1 pep KPKMAVNLYAGPQTTSVIANIADNLQIJ^DYGKVHWFASPLFWLLNQLHNIIGNWGWAIV 

I : ::||| I I I I I I I I I I I I I I I I I I I I M II II I I I I I I I I I I I I I I I I I I : 

orf 11-1 KAEASINLYAGPQTTSVIANIADNLQIJ^DYGKVHWFASPLFWLLNQLHNIIGNWGWAII 
40 300 310 320 330 34 0 350 

370 380 390 400 410 420 

orfllna-1 pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 
9 I Mill I I III Ml II Mil II I II I llllll:IM nil II II I II I I 111! HUM 

45 o r f 1 1 - 1 VLT 1 1 VKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 

360 370 380 390 400 410 

430 440 450 460 470 480 

orfllnq-1. pep LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQT 
50 I I I I I I II I I I I I I I I I I I I II I II I I II I I I I I I I I I II I II I I 1 II I I I I I II I I III 

orf 11-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 

420 430 440 450 460 470 

490 500 510 520 530 540 

55 orfllna-1 pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLT I AQQWH INRSIEKQRA 

9 I I | | | | H II I I I II I I II I I II I I I I I I I I I I I I I I II I I I I Ml I I I I I I I II I I I II 

orfll-1 YLN PP PTDPMQAKMMKIMPLVFS VMFFFFPAGLVLYWWNNLLTI AQQWHI NRS I EKQRA 

480 490 500 510 520 530 



60 



orfllng-l.pep QGEWSX 
I II II I I 

orfll-1 QGEWSX 
540 



65 In addition, ORF1 lng-1 shows significant homology with an inner-membrane protein from the 
database (accession number p25754): 
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60IM PSEPU STANDARD; P RT; 560 AA. 

P25754; 

01-MAY-1992 (REL. 22, CREATED) 
01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 
DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 
DE 60 KD INNER-MEMBRANE PROTEIN. . • • 



ID 
AC 
DT 
DT 



SCORES Initl: 1074 Initn: 1293 Opt: 1103 

Smith-Waterman score: 1406; 41.5% identity in 574 aa overlap 



orfllng-l.pep 
p25754 



orfllng-l.pep 
p25754 



erf llng-l.pep 
p25754 



orfllng-l.pep 
p25754 



orfllng-l.pep 
p25754 



orfllng-l.pep 
p25754 



orfllng-l.pep 
p25754 



orfllng-l.pep 
p25754 



10 20 30 40 

MDFKR LTAFFAIALVIMIGW EKMFPT PKPVPAPQQAAQKQ 

I | : | | : : I : : : : I : : : I : : I 1 I I I I : : : I : : 

MDIKRTILIAALAWSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 

10 20 30 40 50 60 

50 60 70 80 90 

AAT AS AEAALAPAT PIT VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

: : | : { | : : I :|:: I I I::: :ll :|l: :|:l II hill 

VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 

70 80 90 100 110 120 

100 110 120 130 140 

VLFGDGKEYT YVAQSELLDAQGNN I LKGI G FSAPKKQYTL-NGD TVEVRLSAPE 

II : i I : ! : 1 I I I : : I : : : I : : I : I : I I : I : : I : : : : I 
QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQLADGQEQLWDLKFS — r 
130 140 150 160 170 

150 160 170 180 190 200 

TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 
| [ : : | : : I : I :H : I I III: I : : : I I I : I : : I : I 
DNGVNYIKRFSFKRGEYDLNVSYLIDNQSGQAWNGNMFAQLKRDASGDPSSSTATGTATY 
180 190 200 210 220 230 

210 220 230 240 250 260 

VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 

: I : : : I : : I I I : : I : I I : : : I : : M : : : : I : I : : : I I I = 

LGAALWTASEPYKKVSMKDI D KGSLKE NVSGGWVAWLQHYFVTAWI -PAKSD 

240 250 260 270 280 

270 280 290 300 310 320 

QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAI PTRGPKPKMAVNLYAGPQTTSVIANIAD 
: I | :::::: I : : I : : : I : I I : : : I I I I I : I : : : : 

NNV VQTRKDSQGNYIIGYTGPVISVPA-GGKVETSALLYAGPKIQSKLKELSP 

290 300 310 320 330 

330 340 350 360 370 380 

N LQLAKDYGKVHWF-AS P L FWLLN QLHN I IGNWGWAI WLTI IVKAVLYPLTNASYRSMA 
: | : | : ill : I I I : I : I I I I : : : I : : : I I I I I : I : I I I : : : I : : : : I I : I 1 1 I I I I 
GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
340 350 360 370 380 390 

390 400 410 420 430 440 

KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 

:|||:MII ::U::IMI: ::IMI:MI I 1 1 I I I I I M I : I : I : I M : : I I I : I : 
RMRAVAPKIAAIJCERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 
400 410 420 430 440 450 



45C 460 470 480 490 500 

orfllng-l.pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMXIMPLVF 
| | | : | | | | | : I I I I 1 I I I :: II II I I : 1 I I I 1 1111111111:11:11::! 
P 25754 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 

510 520 530 540 

orf llng-1 . pep SVMFFFFPAGLVLYWWNNLLTIAQOWHINRSIEKQRAQGEWSX 

: :|::l|llll!imi I |:|:IM:|:I II 
P 2 57 54 TFFFLWFPAGLVLYWWNNCLS ISQOWYI TRRIEAATKKAAA 

520 530 540 550 560 
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Based on this analysis, including the homology to an inner-membrane protein from P. putida and 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
is predicted that the proteins from N.meningitidis and ^gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 Example 8 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 59>: 

1 GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 
51 ' NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

10 151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

251 ACCGTTACGA AGTT.TTTAT CGCGGTACG . ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

1 5 This corresponds to the amino acid sequence <SEQ ID 60; ORF13>: 

l . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61 >: 

20 1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

25 251 ACCGTTACGA AGTTTTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
30 51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVfY RGTHWQAQNT 

101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from ^meningitid is (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A ofN. 
35 meningitidis: 

10 20 30 40 50 

orfl3 pep AVL I I ELLTGTVYLLWS AALAG SG I AYGLTG ST PAAVLTXA LL5 ALG IX F 

' y y ii mi nun! mil inn it mi i ii in ii iii i niuiii i 

or fl3a MTVWFVAAVAVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTA ALLSALGIWF 
40 " 10 20 30 40 50 60 

60 70 80 90 100 110 

orfl3 pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 

mini 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 = 1 1 1 1 1 : 1 1 m 1 1 1 Mi) miii mil limn 

45 orf!3a VHAKTAVGKVETDS YQDLDAGQYAEI LRHAGGNRYEV FYRGTHWQAQNTGQEE LE PGTRA 

70 80 90 100 110 120 

120 

orf 13 . pep LIVRKEGNLLI ITHPX 
50 I M I I I I I I I M : : 1 t 
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orfl3a LI VRKEGNLLI IAKPX 

130 

The complete length ORF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

5 51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

10 301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

401 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 64>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
15 51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 

101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 



10 20 30 40 50 60 

orf 13a pep MTVW FVAAVAVLI IELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

20 — - - | | 1 | | | | | I | | I I I I f t i I I I I I I I 1 1 I 1 1 1 1 I I I I I 1 I I MINIM I 

or f 1 3- 1 AVLI IELLTGTVYLLWS AALAGSGI AYGLTGSTPAAVLTXALLSALGIXF 

10 20 30 40 50 

70 80 90 100 110 120 

25 orf 13a .pep VHAKTAVGKVETDS YQDLDAGQYAEI LRHAGGNRYEVFYRGTHWQAQNTGQEELE PGTRA 

I I I I I I I I I i I I II I I I I I I I I : I I I I t : I I I I I ! I I I I I II I I I II I 1 I I I I I 

orf 13-1 VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELE PGTRA 

60 70 80 90 100 110 
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orf 13a .pep 
orfl3-l 



130 

LI VRKEGNLLI IAKPX 
IMIimilM::ll 
LI VRKEGNLLI ITHPX 
120 



Homology with a predicted ORF from N gonorrhoeae 

ORF 13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from N. 



40 



gonorrhoeae: 

orf!3 
orf 13ng 
orfl3 
45 orfl3ng 
orfl3 
orf 13ng 



AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! I ! 1 1 1 i 1 1 1 1 1 1 1 1 1 i 1 f I 1 1 1 1 1 1 1 1 1 1 I 

MTVW FVAAVAVLI IELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 



VHAKTAVRKVETDS YQDLDAGQYVEI LRHTGGNRYEVXYRGTXWQAQNTGQEELE PGTRA 
IMtlll |IMtllllll:i:l:IIH:IIMIIII I I 1 1 MINIMI NIIMI 
VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPi 

LI VRKEGNLLI ITHP 126 

It I I I I t 1 1 M I : : I 
LI VRKEGNLLI IAN P 135 



51 



60 



111 



GTRA 120 



50 The complete length ORF 1 3ng nucleotide sequence <SEQ ID 65> is: 



55 



1 


ATGACTGTAT 


51 


GACGGGAACG 


101 


GCATTGCCTA 


151 


GCACTGCTTT 


201 


GGGAAAAGTT 


251 


CCGAAATCCT 


301 


GGTACGCACT 


351 


AACGCGCGCC 


401 


ACCCTTAA 



GGTTTGTTGC 
GTTTATCTTT 
CGGGCTGACT 
CCGCGCTGGG 
GAAACGGATT 
CCGATACACA 
GGCAGGCGCA 
CTCATCGTCC 



CGCTGTTGCC 
TGGTTGTCAG 
GGCAGCACGC 
CATTTGGTTC 
CATATCAGGA 
GGCGGCAACC 
AAATACGGGG 
GCAAAGAAGG 



GTCTTAATCA 
CGCGGCTTTG 
CTGCCGCCGT 
GTACATGCCA 
TTTGGATACC 
GTTACGAAGT 
CAGGAAGTGT 
TAACCTTCTT 



TCGAATTATT 
GCGGGTTCGG 
CTTGACCGCC 
AAACCGCCGT 
GGAAAATATG 
TTTTTATCGC 
TTGAACCGGG 
ATCATCGCAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYG LT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETD5YQDLDT GKYAEILRYT U3NRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

ORF13ng shows 91.3% identity in 126 aa overlap with ORF13-1 : 

10 20 30 40 SO 

orfl3-l d€D AV LI I E L LTGTV YLLW S AALAG S G I AYG LTG S T P AAVLTXALLSALG I X F 

w miMlMMMMIIIM III MIMIMIMI Mill II II III I I 

orfl3na MTVWFVAAVAVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
° rIAJn9 io 20 30 40 50 60 

60 70 80 90 100 HO 

or f 1 * - 1 Deo VHAKTAVRKVETDS YQDLDAGQYVEI LRHTGGNRYE VFYRGTHWQAQNTGQEELE PGTRA 

~ * F MtMM II | 1 I I : I : I : I I M : I 11 I I I M I M 1 1 M M I I I M I :MIIM 

orfl3na VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 
9 70 80 90 100 110 120 



120 

orf 13-1 .pep LI VRKEGNLLI ITHPX 
1 I 1 1 1 1 I f 1 1 1 1 * x f f 
orfl3ng LI VRKEGNLLI IANPX 

130 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF1 3 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N. meningitidis <SEQ ID 67>: 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA . 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

401 ATGCCGTC. 

This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 

1 MXOFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 
51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
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551 MGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 
601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; ORF2-l>: 

5 i MF DFGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS * 

10 Further work identified the corresponding gene in strain A oiN. meningitidis <SEQ ID 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

|5 201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

2 51 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

20 4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

€51 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

25 This encodes a protein having amino acid sequence <SEQ ID 72; ORF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

30 201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a I18aa 
overlap with ORF2a: 

10 20 30 40 50 60 

orf2 pep MXD FGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 

35 1 i 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 mi iriiiiiiinii i nun 1 1 ii ill 1 1 ii 

orf 2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
T6 20 30 40 50 60 

70 80 90 100 110 120 

40 orf2 peD KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

| | | | | (II | | | I I I I I I I I I I I I I I II I M I I I I I II I I I I I I I i I HI I I I III I I I 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 
70 80 90 100 110 120 

45 130 

or f 2 . pep RCGKH P I RRH FRRY AV 

0^f2a DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 
130 140 150 160 no 180 

50 The complete strain B sequence (ORF2- 1 ) and ORF2a show 98.2% identity in 228 aa overlap: 

orf 2a pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

iiiitiinniiimiimmmiiinmiimnmimimmm 

orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

55 o-f2a pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

in iiiimnimmiiiiimmiimmimimmmiri 

or f 2-1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPL? 120 

orf 2a pep DAANTLLDGI SDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTAS AAA PW 180 

60 mm 1 1 1 m i m 1 1 1 1 1 1 m m 1 1 1 M m 1 1 1 m 1 1 m m 1 1 m 1 1 1 1 
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DAANTI^DGISD^PSERSYASAETLGDSGQTGSTAEPAETDQDI^WREYLTASAAAPVV 180 

QTVEVSYI DT AVETPVPHTTS LRKQAI SRKRDLRPKSRAKPKLRVRKSX 229 
I I I I I I I I I ! | | | | | | | I I M I I I I I I I I »l I : M I II I M I M I I M 
QTVEVSYI DTAVET PVPHTTS LRKQAI SRKRDFRPKHRAKPKLRVRKSX 229 



Further work identified a partial DNA sequence <SEQ ID 73> in ^gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

l MFDFG LGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
10 51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

15 ioi GGCTTATCGG CAGGCTGCAA CGCTTTGTAG GAAGCGTCAA ACAAGAACTT 

151 GACACTCAAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 TGCAGAACAG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

20 351 tCCCCttccc gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCTGA ACGTTCCGAT ACTtccgcCG AAACCCTTGG GGACGACAGG 

4 51 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcgaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tccgcaCacc 

25 601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

651 ACACCGCGCc aAACCGAAat tgcgcgtcCG TAAATCATAA 

This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: 

! MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

ioi LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS * 

The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

15 orf2 oeD mXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

11 on;. pep ||, | M UN HII Ml I I II I I I I I I : I I I II 1 1 I Ml I I I I I I I : I I I I I I I I I I 

orf2ng MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf2 oeD kAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

40 l:tl I ! II II M M I M 1 I I M : : : M II M I I M I M I I I M M I M M M : 1 1 

orf2ng kVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGNSLP 120 

orf2.pep RCGKHPIRRHFRRYAV 136 

I ill 1 1 1 1 i ll 1 1 1 

45 orf2ng RYGKHRIRRH FRRYAV 136 

The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) show 91.7% identity in 
229 aa overlap: 

10 20 30 40 50 60 

orf2-l t>eo MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
50 II 1 I I I I II : I I I I M I I I I M M M I II 11 M II M I M I M M I I I I : III M I I I M 

orf2nq-l MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQREVGSVKQELDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

55 orf2-l oeD KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

" PP |:|| i 1 1 I I 1 1 1 1 I I t I t 1 MI:::MMIMIMMIMMIMIMMIIIIIMI 

orf2ng-l KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
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70 80 90 100 HO 120 

130 140 150 160 170 180 

orf2-l pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 

5 1:111: Illltll :IIIMIIs I I I I I I I || I I t : H II I t I II I I I I I I I 1 

orf2nq-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPVV 

130 140 150 160 170 180 

190 200 210 220 229 

10 orf2-l pep Q- TVEV S Y I DT AVET P V PHTT S LRKQAI S RKRD FRPKHRAKPKLRVRKSX 

* * I : I 1 I I I I 1 I I | | I I I I I It I I I I I ! I : I I I I I I M I I I I I I I I I I I I 

orf2ng-l QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 

190 200 210 220 230 

Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
1 5 and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 

of E.coli: 

gnl|PIDIel292181 (AJ005830) TatB protein [Escherichia coii] Length * 171 
Score =56.6 bits (134), Expect - le-0*? 

Identities = 30/88 (34%), Positives - 52/88 (59%), Gaps = 1/88 (1%) 
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Query: 1 MFDFGLGELI FVGI IALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL -r++L+E + 
Sbjct: 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 



25 Query: 61 - KVKQAFEAAAAQVRDS LKET DT DMQN S 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DSLKKVEKASLTNLT PELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
proteins and so the proteins from N [meningitidis and N. gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E.colU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3 A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
of expression of the His-fiision in E.coli. Purified GST-fusion protein was used to immunise mice, 
35 whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 

Example 10 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 77>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC.TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG JCGCTACTCC 

45 251 ATTGATGCAC JcGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

401 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 
551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 
601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG . . 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 

5 i MQARLLIPIL FSV FILSAC G TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEM. . 

1 0 Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

15 201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

20 451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

25 701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

30 951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; ORF15-l>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

35 151 IGGMGDYHNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 81>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

45 251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

50 501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

55 -751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

60 This encodes a protein having amino acid sequence <SEQ ID 82; ORF1 5a>: 

l MQARLLIPIL FSVFILSACG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 
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201 
251 
301 



DMDI/QALHGR 
DYTYPRYETT 
IGGMGDYRNE 
IDVFGTIRNR 
AYKENYALWM 
SHEGYGYSDE 



KVALYIATMG 
AETTSGGLTG 
TLTTNPRDTA 
TEMHLYNAET 
GPYKVSKGIK 
AVRRHRQGQP 



DQGSGSLTGG 
LTTSLSTLNA 
FLSHLVQTVF 
LKAQTKLEYF 
PTEGLMVDFS 



RYSIDALIRG 
PALSRTQSDG 
FLRGIDWSP 
AVDRTNKKLL 
DIQPYGNHMG 



EYINSPAVRT 
SGSKSSLGLN 
ANADTDVFIN 
IKPKTNAFEA 
NSAPSVEADN 



The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 213aa 



overlap with ORF 15a: 
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10 20 30 40 50 60 

or f 1 5 . pep MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
MIMIIMIIIIIMMIMIIIIIilll I I i I I I I I I I I I I I I I M M I I I ! I I I I I 
orfl5a MOARLLIPILFSVFILSA CGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

l0 20 30 40 50 60 

70 80 90 100 110 120 

orfl5.pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I I II I I I II I I I I I I I I 1 I I I I I I I I I I I I J I I I I M I I I I I 1 I I I I I 
orf 15a KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15. pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I I I I I I II I I I 1 I I I I I I I I I I I I I I I I I I II I i I I I I I I I I II I I I I I I I I 
orf 15a LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 

orflS.peo FLRGIDVVSPANADTDVFINIDVFGTIRNRTEM 
I I I I I I I I M I I I I I M I I I I I I I M I I I I I I I 
orf 15a FLRG I DWSP ANADTDVFIN I DVFGT I RNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap: 

10 20 30 40 50 60 

orf 15a . oep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I i t I I I I I I I I I I 
orf 15-1 MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



70 BO 90 100 110 120 

40 orfl5a.peo KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETT AETTSGGLTG 

I ! I I I I I I I I ! I I I I I I I II I I I I I I 1 I I I I I I! I M I M I I I I I I I I I I i I I I I i II i I 
orf 15-1 KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl5a.peD LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I I I I II II I I II I I I I I 1 I I I I M I M I I I I I I I I I I M I I I I I I I I I I I I I 
orf 15-1 LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVOTVF 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15a . pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
1 I I I 11 I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf 1 5- 1 FLRG I DWS PANADTDVFIN I DVFGTI RNRTEMHLYNAETLKAQTKLSYFAVDRTNKKLL 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf!5a.pep I KPKTN AFEAAYKEN Y ALWMGP YKVSKG IK PTEGLMVDFSDI QP YGNHMGN S APS VEADN 

I I I I I I I I I I I I I I I I I I t I I I I ! I II I I I I I I I I I II I II I : I I I II I I I I I II I I I I 
60 orf 15-1 IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGU4VDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 
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310 320 
or f 1 5a . pep SHEGYGYSDEAVRRHRQGQPX 
65 I I I I I I I I I I : I I : I I M I I I 

or f 1 5- 1 SHEGYGYSDEWRQHRQGQPX 
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F urther work identified the corresponding gene in N. gonorrhoeae <SEQ ID 83>: 

1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG G T ATT C CATC GCATGGCGGA GGCAAACGCT 

5 101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

in 351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

]5 gel ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

20 851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 84; OKF15ng>: 

1 MRARLLIPIL FSVF ILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

?5 51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

10 l DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

30 301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 97.2% identity over a 213aa 
overlap with ORF15ng: 

orfl5 pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 60 
I r l I I I 1 1 1 I I I 1 I 1 I 1 1 1 I I I I t I I 1 I t 1 M M 1 I I I M I M ! I I U I M I I t I I I I I 
35 or f 1 5ng MRARLLI PI LFSVFI LSACGTLTG I PSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 60 

orfl5 Pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

| t I K I I I I I I I i I I t I I I 1 1 I I I I I ft I 1 I i I I 1 I I t I I 1 1 I I t I I % I I I 1 I I I I I I I 
orf!5ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

orfl5 pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

| I MM || I I Ml IIIIIIMI I: I I I Ml II I I 1 III I I I I I I I I I Ml I 111 1 1 I I I I 
orfl5ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

45 orfl5.pep FLRG I DWS P AN ADT DVFIN I DVFGT I RNRTEM 213 

■ i ■ i i i i i i i i i i > Illllllllll 

orfl5ng 

The complete strain B sequence { 

10 20 30 40 50 60 

50 orfl5-l Pep MQARLLI PILFSVFILSACGTLTGI PSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

Mllll IMMIllllimilllllMMlM M I I I U II IIIIIIIIIIUI Ml I I 
orflSng MRARLLI PI LFSVFI LSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 

55 70 80 90 100 110 120 

orf!5-l pep KV ALY I ATMG DQGSGS LTGGR YS I DAL I RGEY IN SPAVRT DYTYPRYETT AETTSGGLTG 
| | 1 | i I 1 I I ! I I ft ! I ! I I I I t I ! i I 1 I I I I I 1 I I 1 I I 1 » I 1 I 1 1 1 I I t f I 1 t I 1 I 1 I I I I 
orfl5na KVALYI ATMG DQGSGSLTGGRYS I DALI RGEY IN SPAVRT DYTYPRYETT AETTSGGLTG 

9 70 80 90 100 110 120 



40 



FLRG I DWS PAN ADT DVFIN I DVFGT I RNRTEM 213 

MimMiiMiMiiiiiiimmMiii 

FLRG I DW S P ANADT DVFIN I DVFGT IRNRTEMHLYNAETLKAQT KLE YFAVDRTNKKLL 240 

B sequence (ORF15-1) and ORFlSng show 98.8% identity in 320 aa overlap: 



60 



130 140 150 160 170 180 

Otf 15-1 . pep LTTSI^LNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
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I 1 1 1 1 1 1 1 1 1 1 II 1 1| 1 1 1 1 1 1 | : | I I I M I 1 1 1 1 1 1 1 1 1 1 I ! M I I LLiiJiiJiiiii 
orfl5na LTT SLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
9 130 HO 150 . 160 HO 180 

190 200 210 220 230 240 

orfl5-l neo FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
* P P iiltlllllMllillllllllillllllliltlMllllllllllllllltlillllil 
orflSna FL r G IDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
y 190 200 210 220 230 240 



250 260 270 280 290 300 

or f 15-1 Pep I KPKTN AFEAAYKEN Y ALWMG P YKV S KG I K PT EG IWVDFS D I R P YGNHTGN S AP S VEADN 
| | | | I I I I 1 I I I I I I I I I M I II I I I M I I I I I I I i I t I 1 I I : I I I I I I I 1 1 I I I 111 I II 
orfl5na iKPKTNAFEAAYKENYALWMGPYKVSKGXKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 
15 ^ 250 260 270 280 290 300 

310 320 
orf 15-1 .pep SHEGYGYSDEWRQHRQGQPX 
I I | | I I I I I I : I I I I I I I I I ! 
20 orfl5ng SKEGYGYSDEAVRQHRQGQPX 

310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from Kmeningitidis and 
25 ^.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
30 results of expression of the His-fasion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
experiments confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 11 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 85>: 

35 i GG CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

3 51 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

40 251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

451 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

45 soi CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC.TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 

1 GOHKKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 
51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 



BNSDOCtP. <WO 992457aA2_L> 



WO 99/24578 



-102- 



PCI7IB98/01665 



101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGFLYLPAV 
151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

5 51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

10 301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

4 51 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

15 551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 TC.TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

20 801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMIFGVFTGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

25 151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 X FGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical E influenzae transmembrane protein HI0902 (accession number P44070) 
30 ORF1 7 and HI0902 proteins show 28% aa identity in 1 92 aa overlap: 

ORF17 3 HKKQAVNGKTVFTMMPGMIFGVFT-GAFSAKYIPAFGLQIF — FILFLTAVAFKTLHTDP 59 

HK + +V+P++VFGF + +IF +++L ++ D 

HI0902 72 HKLGNIVWQAVRILAPVIMLSVFICGLFIGRLDREISAKIFACLWYLATKMVLSIKKD- 130 

35 ORF17 60 QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPI 119 

Q ++ L L 4 L G SS GIGGG VPFL G +AIG+S+ + 

HI0902 131 QVTTKSLTPLSSVIG-GILIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLL 189 

ORF17 120 ALS G AI S YLLNGLN I AGLPEGS LG FLYLPAVAVLS AAT I AFAPLGVXXXXXXXXXXXXXX 17 9 
40 +SG S++++G +PE SLG++YLPAV ++A + 4 LG 

HI0902 190 GISGMFSFIVSGWGNPLMPEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKG 24 9 

0RF17 180 FGIMLLLIAGKM 191 
F + L+++A M 
45 HI0902 250 FALFLIWAINM 261 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of N. 
meningitidis: 

50 10 20 30 

orf 17 . pep GQHKKQAVNGKTVFTMMPGMI FGVFTGA FS 

I I II I I I I : I I I M I I I I I : ! I I ! : I I : I 
or f 1 7 a QGLAQH PYAQHLA VGt* S FAVMVFTA FS SML GQHKKQAVDWK TVFTMM PGMVFGVFAGA LS 

50 60 70 80 90 100 
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40 50 60 70 80 90 

orfl7 .pep AKYI P AFGLQI FFILFLTAVAFK TLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 
I | || | | | II I M I I II I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I 
orf 17a AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 
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orfl7 .pep 
orfl7a 

orf 17 .pep 
orfl7a 



100 110 120 130 140 150 

GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
| I I I I I t I | | | | | | | | | i 1 | I M n I i > n I I I I I I I I M M I n I I I I M I I I I I I II ! 
GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 

160 170 180 190 

AVLS AAT I AFAPLGV KTAHKLS SAKLKKS FGIMLLLIAGKMLYNLL X 
MlllllllllllllinillMI II MIMIIIItnilMliMI 
AVLSAATIAFAPLGVK TAHKLS SAKLKKS FG IMLLL I AGKMLYNLL X 
230 "240 250 260 



The complete length ORF 17a nucleotide sequence <SEQ ID 89> is: 
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20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCGTATTTAC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCACAA 
CCCCTTCTTA 
CATCCGGCCT 
CTCAACGGCC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGCTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGA 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
TCGGCGTAGG 
GATTTGCAGG 
CACATCCTTC 
AGCACAAAAA 
GGTATGGTAT 
AGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCCGTAG 
CGGCGGCACG 
GTTTGGCACA 
GCCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGAC 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCAC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



GCAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CCTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAAAA 
TGTACAACCT 



This encodes a protein having amino acid sequence <SEQ ID 90>: 
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i 

51 
101 
151 
201 
251 



MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 
AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMVFGVFAGA 



LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG 
FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAW? 
LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA 
SFGIMLLLIA GKMLYNLL* 



LPGLTAVSTL 
IALSGAISYL 
KKLSSAKLKK 



ORF17a and ORF17-1 show 98.9% identity in 268 aa overlap: 
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45 



50 



55 



60 



10 20 30 40 50 60 

orf 17 a pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

iitiiiiniiiimMiiti iiiiiiimiiiiih iiiMiiiiiiniitiiiiii 

orf 17-1 MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf 17a pep ■ AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQIFFILFLT 
MlillltlMIIIIIlliill llllllllll:lill:IMIIIIIIMIlllllllltl 
O r f 1 7 - 1 AVMVFT AFSSMLGOHKKQAVDWKTV FTMM PGM I FG VFTGALS AK Y I PAFGLQI FFI LFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17a. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
| | | | | I | | | | | I i I I I I II I I I I 1 I I I I I I II I ! I I I I I I I II I I I M I I I I I I I I I I I I 
or f 1 7 - 1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMS SWVGIGGGSLS VPFLI HCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17a pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
M | | | | I I I I I I ! I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I M I I I 1 I I I I I I 
orf 17-1 IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



65 



orf 17a. pep 



250 260 269 

HKLS SAKLKKS FGIMLLLIAGKMLYNLLX 



BNSOOCIO: <WO__992457BA2_ »_> 



WO 99/24578 PCT/IB98/0I665 

-104- 

I 1 1 I 1 1 1 I 1 I MilitllllllltllM 
O r f 1 7 - 1 HKLS S AKLKKXFGIMLLLI AGKMLYN LLX 

250 260 

5 Homology with a predicted ORF from N gonorrhoeae 

ORF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORFH.ng) from N. 
gonorrhoeae: 

orfl7 oeo GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 30 

* P P t I Mill I: I 1:1:111 Mill I l: M:l 

1 0 orf 17ng QGLAQHPYAQHLAVGTSFAVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMI FGVFAGALS 1 02 

orf 17 .pep AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 

1 I M I I 1 I M I M M I M 1 1 I M M I I IIIMIIIIII III IMI M: HII Mill 
orfl7ng AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 



15 



orf 17 pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 

I I I I I I I I I I I I I I I I I I I I I I I I II III I I I I : M I M I I I II I M I II III I I 

orfl7ng GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 



20 orfl7.pep AVLSAATI AFAPLGVKTAHKLS SAKLKKS FGIMLLLIAGKMLYNLL 196 

I I I I I I I I t I I I 1 I t 1 I I I 1 1 1 1 t t I 1 r f I I I I I I I I I I I I I ! I 1 I 
orfl7ng AVLSAAT I AFAPLGVKTAHKLS SAKLKES FGIMLLLIAGKMLYNLL 268 

An ORF1 7ng nucleotide sequence <SEQ ID 91> is predicted to encode a protein having amino acid 
sequence <SEQID92>: 

25 1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

30 251 SFGIMLLLIA GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

35 151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

40 401 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

45 651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAGAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 

50 1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTATsS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 

55 251 S FGIMLLLIA GKMLYNLL * 

ORF17ng-l and ORF17-1 show 96.6% identity in 268 aa overlap: 

10 20 30 40 50 60 

orf 17-1 . pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 



BNSOOCIO. «WO_9GZ457aA2J_> 



10 



WO 99/24578 PCT/IB98/01665 

-105- 

] ! I I ,,l, IIMt |l M in M I Mllinill 111 I 1 I M I I M M I II I lltl II I I I 
or£17nq-l MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl7-l Dep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGM1FGVFTGALSAKYIPAFGLQIFFILFLT 
| | | | | 1 1 I I I I 1 1 1 1 1 1 1 1 I I I It : t : I I I 1 1 1 1 I I I : I I M 1 I 1 1 I I I t t I I ( I I 1 M I 
orf^nq-1 AVMVFTAFSSMLGQHKKQAVDWKT IFAMMPGMI FGVFAGALSAKYI PAFGLQI FFI LFLT 

" 70 80 90 100 110 -120 



!30 140 150 160 170 180 

orfl7-l Pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIKCGFPAKKA 
| | | | t I 1 I I | I I I I I I I M I I I til I III I : I I I M M M I M I I 1 I IN M I I 1 i M 
orfl7na-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 
15 9 130 140 150 160 170 180 

190 200 210 220 230 240 

orfl7-l pep I GTS SGLAWP IALSGAI S YLLNGLNIAGLPEGSLGFLYLPAVAVLSAAT I AFAPLGVKTA 
I I I I I I | I I I ) | I I I M I I I : I I I I I I I I I I I I I I M I I I I I t I I I I I I I I I i I I I I I 1 I 
70 orfl7na-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

y 190 200 210 220 230 240 

250 260 269 

or'17-1 .pep HKLS S AKLKKX FG IMLLL I AGKMLYN LLX 
25 * I I I I I I I I I : it t I I I I I I I M I I I I I I 

orfl7ng-l HKLSSAKLKESFGIMLLLIAGKMLYNLLX 

250 260 

In addition, ORF17ng-l shows significant homology with a hypothetical H.influenzae protein: 

SDIP44070IY902 HAEIK HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
30 HI0902 - Haemophilus influenzae (strain Rd KW20) gi 1 1573922 (U32772) H. influenzae 

predicted coding region HI0902 [Haemophilus influenzae] Length « 264 
Sco-e = 74 (34.9 bits), Expect - 1.6e-23, Sum P(2) - 1.6e-23 
Identities - 15/43 (34%), Positives - 23/43 (53%) 

35 Query 55 AVGTS FAVMVFT AFSSMLGQHKKQAVDWKT I FAMMPGM I FGV F 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGTS FATI VITGIGSAQRHHKLGN I VWQAVRI LAPVIMLSVF 94 

Score = 195 (9l".~9 bits), Expect = 1.6e-23, Sum P(2) « 1.6e-23 
40 Identities = 44/114 (38%), Positives - 65/114 (57%) 



45 



50 



Que-V 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct: 14 8 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query 210 PEGSLGFLYLPAVAVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 

This analysis, including the homology with the hypothetical H.influenzae transmembrane protein, 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

55 The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 95>: 

1 GGAAACGGAT GGCAGGCAGA CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 

51 CGTCAGTAAT GTATCGATGA CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 

101 TGCATTATTG CTTTTCGGGA ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 

*51 CTCAAACTTT ATGCGCTGAA GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 

60 201 GCTGATGGCG GTTGCCTATG TCCACCGCTG CGGTATAGAC CGGCAGCCGC 

251 CGTCAACGTT CGGCGGCTCG CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 



BNSDCCJD: <WO_9824578A2_l_> 



WO 99/24578 PCT/IB98/01665 

-106- 

301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 
351 A 

This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 . .GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
5 51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 

101 I^QVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

10 101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

15 351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

4 51 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

20 601 AGATAA 

This corresponds to the amino acid sequence <SEQ ID 98; ORF1 8-l>: 

i MILLHLDFLS ALLY AAV FLF LIFRAGMLQW FWASIMLWLG I SVLGAKLMP 
51 GIWGMTRAAP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA L KPVYWFVLQ 
25 151 FVLMAVAYVH RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG 

201 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF18 shows 98.3% identity over a 1 16aa overlap with an ORF (ORF18a) from strain A of N. 
30 meningitidis: 



or f 18. pep 
orfl8a 



10 20 30 

GNGWOADPEHPLLGLFA VSNVSMTLAFVGI 
I 1 I I I I i I I 1 I I I I I 1 I I 1 I I I 1 I I t I 1 I I 
TRAAPLFIPHFYLTLGSIFFFIGHWKRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 



35 60 70 80 90 100 110 

40 50 60 70 80 90 

orfl8 pep raT.VHvrFSGTVOVFVFAALLKLYALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
mTTimii ! M I t 1 I 1 I 1 I t I I I I 1 I I I I 1 t I 1 I I 1 I I I I I I I I I I I M I I I I t t I 1 1 
40 orf!8a CAL V H Y C FSXT VQV FV FAALLKL Y ALK P VYW FVLQ FV LMAV AYV HRCG I DRQ P P S T FGGS 

— 120 130 140 150 160 170 

100 110 
orfl8.pep OLRLG GLTAALMQVSVLVLLLS EIGRX 
45 I I I 1 II I I I I I I I I 1 1 I I I I I I I I I I 

orfl8a OLRLG GLTAALMQXSVLVLLLS EIGRX 
180 190 200 

The complete length ORF18a nucleotide sequence <SEQ ID 99> is: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

50 51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

55 301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 



BNSOOCta <WO 9924578A2J.> 



WO 99/24578 



-107. 



PCT/1B98/0166S 



451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

5 This encodes a protein having amino acid sequence <SEQ ID 1 00>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 

51 GIWGMTRA AP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 

101 F AVSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA LKPVYWFVLQ 

!51 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQX5 VLVLLLS EIG 

201 R* 
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ORF18a and ORF18-1 show 99.0% identity in 201 aa overlap: 
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20 



25 



30 



35 



orf 18a. pep 
orfl8-l 

orf 18a. pep 
orfl8~l 

orf 18a. pep 
orfl8-l 

orf 18a. pep 
orfl8-l 



10 20 30 40 50 60 

MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIKLWLGISVLGAKLMPGIWGMTRAAP 
| t 1 t 1 I I 1 1 I I I 1 I S I I I t I 1 I I t I I 1 I ! I I I I 1 I 1 I I ! I I t I 1 I 1 1 t I I I 1 I t I f 1 1 I I 
MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

| | | | | | | | | | | 1 | | | I I I I I I I I I I I 1 i I I II I I I I I I M I I I I I I I I I I I I M i I I t I I 
LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

mm imiiimiiMimiiiiiiiHiMiiiiiiiiiimiiiiitiimi 

Y C F S GT VQV FV F AAL LKL Y ALK P V YW FV LQ FVLMAVAY VHRCG I DRQ P P S 7 FGG SQLRLG 
130 140 150 160 170 180 

190 200 
GLTAALMQXSVLVLLLSEIGRX 

|| I I I I I I II II I I I I I I I I I 
GLTAALMQVSVLVLLLSEIGRX 
190 200 



Homology with a predicted ORF from N gonorrhoeae 

ORF 18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from A'. 
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50 



55 



60 



gonorrhoeae: 

orf 18. pep 
orf 18ng 
orf 18. pep 
orf 18ng 
orf 18. pep 



GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 30 

IMIIIIiMII I III IIMIIIMMMI 
TRAAPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWOADPEHPLLGLFAVSNVSMTLAFVGI 115 

CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 
M M 1 I I I I I I I I I I I I I I M H M I I I I N II I I i i I i I I M 1 li I I M I 1 I I Ml I ( I 
CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 175 

QLRLGGLTAALHQVSVLVLLLSEIGR 116 
Mill Ml HUM : : I I : I M I 
QLRLGVLAAMLMQVAVTAMLLAEIGR 201 



orf 18ng 

The complete length ORF18ng nucleotide sequence is <SEQ ID 101>: 



1 ATGATTTTGC 

51 tttTctgTTT 

101 GTATTGCGTT 

151 GGGATGTGGG 

201 CCTGACTTTG 

251 CAGATGGAAA 

301 TTTGCCGTCA 

351 GTTGGTGCAT 

401 CATTGCTCAA 

451 TTTGTATTGA 

501 GCCGCCGTCA 



TGCATTTGGA 
CTGATATTCC 
GTGGCTCGGC 
GAATGACCCG 
GGCAGCATAT 
CGGATGGCAG 
GTAATGTATC 
TATTGCTTTT 
ACTTTATGCG 
TGGCGGttgC 
ACGTTCGGCG 



TTTTTTGTCT 
GCGCAGGAAT 
ATCTCGGTTT 
CGCCGCGCCT 
TTTTTTTCAT 
GCAGACCCCG 
GATGACGCTT 
CGGGAACGGT 
CTGAAGCCGG 
CTATGTCCAC 
GTTCGCAGCT 



GCCTTACTGt 
GTTGCAATGG 
TAGGGGTAAA 
TTGTTCATCC 
CGGGTATTGG 
AACATCCGCT 
GCTTTTGTCG 
TCAAGTGTTT 
TTTATTGGTT 
CGCTGCGGTA 
GCGACTCGGC 



aTGCGGcggt 
TTTTGGGCGA 
GCTGATGCCG 
CCCATTTTTA 
AACCGGAAAA 
GCTCGGGCTT 
GAATATGTGC 
GTGTTTGCGG 
CGTGTTGCAG 
TAGACCGGCA 
GTGTTGGCGG 



9MSOOCJD <WO B92457BA2J_» 



WO 99/24578 PCT/1B98/01665 

-108- 

551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 
601 AGATGA 

This encodes a protein having amino acid sequence <SEQ ED 102>: 

1 MTL LHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIALWLG ISVLGVKLMP 
5 51 CMWGMTRAAP LFI PHFYLTL GSIFFFI GYW NKKTUbNGWQ ADPEHPLLGL 

101 FAV SNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LK PVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLAEIG 
201 R* 

This ORF1 8ng protein sequence shows 94.0% identity in 201 aa overlap with ORF18-1 : 

in 10 20 30 40 50 60 

orf 1 6-1 . pep MILLHLDFLSALLYAAVFLFLIFRAGMI^FWASIMLWI^ISVLGAKI^PGIWGMTRAAP 

iiiuiiiMiiiiimmmiiiimmi 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 : 1 1 1 1 1 1 1 1 

orflBnq MI LLHLDFLS ALLYAAVFLFLI FRAGMLQW FWAS IALWLG I SVLGVKLMPGMWGMT RAAP 

10 20 30 40 50 60 

^ 70 80 90 100 110 120 

orf 18-1 pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
| | || | | l | | | I I M I I | I : I I I I I I I I I I I I I I I II I I 1 I I I M I I II I I I I I M I II I I 
orfl8nq LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

20 ~ 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1 8- 1 pep YCFSGTVOVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

I 1 I I 1 | I I 1 1 t 1 I ! I I 1 I I I 1 1 i 1 1 I I f S 1 1 1 I 1 I i I I 1 K I i I I t I I I t 1 I 1 I 1 I 

25 orfl8nq ycFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 

190 200 
orf 18-1 .pep GLTAALMQVSVLVLLLSEIGRX 
30 I : I 1111^1 : : I I : M IN 

orfl8nq VLAAMLMQVAVTAMLLAE IGRX 

190 200 

Based on this analysis, including the presence of several putative transmembrane domains in the 
35 gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 13 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 103>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

40 51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

45 301 GGCGCGGNCG . . . 

This corresponds to the amino acid sequence <SEQ ID 104; ORF19>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 
51 LDNXXTGRLK KIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 
101 GAX. . . 

50 Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 
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201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

*701 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

14 51 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLP FILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 
151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 
201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEKSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 
301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 
351 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 
401 IVEALNLNLG YWILLTAXFV CQPNYTATKS RVRQ RIAGTV LGVIVGSLVP 
451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 
501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 
551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 
701 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with predicted transmenbrane protein YHFK of K influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLKNIITT 65 

L +I+++PVF +V AA +W +MP +LGIIAGGLVDLDN TGRLKN+ T 

YHFK 5 LKAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 
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orfl9 66 VALFTLSSLTAQSTLGTGLPFIIAMTIWXXFTILGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISSFIVQLRIGKPIQYIVLMTVLTFIFTMIGA 101 

5 Homology with « p^icted OK F from ^meningitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) from strain A ofN, 



10 



15 



20 



meningitidis: 

orf 19. pep 
orfl9a 

orf 19. pep 
orfl9a 

orfl9a 
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60 



10 20 30 40 

MTn-P LLKPLLITSLPVFASVFTA ASI^QLGEPKIJ^PFVLGIIAGGLVDLDNXXTGRLK 

7771 i u 1 1 u n i n m n 1 1 1 1 1 m i n 1 1 1 1 n 1 1 1 1 1 1 m mini inn 

^PP LKPLLITS^ 

10 20~~ 30 40 50 60 

70 80 90 100 

t I | . t 1 I I i I | l | | : | | | | | | | | | I I 1 11 I I I I I 111:11 

FT I T P TVft LS S LVAQST T ^TGLP FI LAMT LMT FG FT I MGAV G LK YRT FAFGALAVAT Y 
70 80 90 100 HO 120 

ttt tvtpf.tywTjTNP FMILCGTVLYSTAIILF QIILPHRPVQENVANAYEALGSYLEAKA 
13 0 140 150 160 170 180 



The complete length ORF19a nucleotide sequence <SEQ ID 107> is: 
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30 



35 



40 



45 



50 



55 



60 



65 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGAAAACCC 
CGCCAGTGTC 
AGCTCGCCAT 
TTGGACAACC 
CCTGTTCACC 
TGCCATTCAT 
GGCGCGGTCG 
CGCCACCTAC 
ACCCCTTTAT 
CTGTTCCAAA 
CGCCTACGAA 
ATCCCGACGA 
AGCAACACCG 
TTACCGCCTT 
GCTACTACTT 
GTCGACTACC 
CCGCATCCAC 
CCCAAGCCCT 
CGCGCCATCG 
CGACAATCCC 
GCGTCGACCA 
AACGACCGCA 
CAAAAACACC 
TATTCCGCCA 
ATCGTCGAAG 
CCTTTTCGTC 
AGCGCATCGC 
TACTTTACCC 
CACCCTCTTT 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATCACCG 
CCGCGCCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



CACCCCTCAA 
TTTACCGCCG 
GCCCTTCGTA 
GCCTGACCGG 
CTCTCCTCAC 
CCTCGCCATG 
GGCTGAAATA 
ACCACACTTA 
GATTCTGTGC 
TCATCCTGCC 
GCACTCGGCA 
AGCCGAATGG 
GCGTCATCAC 
CGCGGCAAAC 
CGCCGCCCAA 
AAGAGATGTC 
CGCCTGCTCG 
GCGCGCAAGC 
AAGGCTGCCG 
GACATCCGCC 
GCAGTTCCGC 
TGGGCGACAC 
TGGCAGGCAA 
TGCCGTCCGC 
CCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAAGCCCTG 
TGCCCGTACG 
GCAGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTC 
AAACCGAACC 
CTCGACACCC 
CCAACAGCTC 
ACCGACAAAT 



GCCTCTGCTC 
CCTCCATCGT 
CTCGGCATCA 
ACGGCTGAAA 
TTGTCGCGCA 
ACCCTGATGA 
CCGCACCTTC 
CCTACACCCC 
GGAACCGTAC 
CCACCGCCCC 
GCTACCTCGA 
ATAGGCAACC 
CGCCTTCAAC 
ACCGCCACCC 
GACATACACG 
CGAAAAATTC 
AAATGCAGGG 
AAAGACTACG 
CCAATCGCTG 
ACCTGCGCCG 
CAACTCCAGC 
CCGCATCGCC 
TCCGTCCGCA 
CTGTCCCTTG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATCGAC 
ACCTGTGGCC 
GCCGTATGCA 
AAGCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCA 
CAACTCATCG 
TCCGCACAGG 



ATTACCTCGC 
CTGGCAGCTG 
TCGCTGGCGG 
AACATCATCG 
AAGCACCCTC 
CTTTCGGCTT 
GCCTTCGGCG 
CGAAACCTAC 
TGTACAGCAC 
GTTCAAGAAA 
AGCCAAAGCC 
GCCACATCGA 
CAATGCCGTT 
GCGCACCGCC 
AACGCATCAG 
AAAAACACCG 
ACAAGCCTGC 
TTTACAGCAA 
CGCCTCCTTT 
CCTTCTCGAC 
ACAACGGCCT 
GCCCTCGAAA 
GCTAAACCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATCGTCA 
ATACAGCTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAACGGCGC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACAGCAC 
CAGCAGCGGA 
CCCGGCAGCT 
CAGCCCCAAA 



TTCCCGTTTT 
GGCGAACCCA 
CCTGGTCGAT 
CCACCGTCGC 
GGCACAGGTT 
TACCATCATG 
CACTCGCCGT 
TGGCTGACCA 
CGCCATCATC 
ACGTCGCCAA 
GACTTTTTCG 
CCTCGCCATG 
CCGCCCTGTT 
AAAATGCTGC 
CTCCGCCCAC 
ACATCATCTT 
CGCAACACCG 
ACGCCTCGGC 
CAGACAGCAA 
AACCTCGGCA 
GCAGGCAGAA 
CCGGCAGCCT 
GAATCAGGCG 
CGCCTGCACC 
TACTGACCGC 
CGCGTCCGCC 
GCTCGTCCCC 
TCGCCAGTAC 
TCGACATTTT 
GTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
CTATCTCGAA 
ACGTCGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 
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This encodes a protein having amino acid sequence <SEQ ED 108>: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MKTPPLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNLN LG 
YFTPSVETKL 
YAAMPVRIID 
KITERLKSGE 
PGFTLLKTGY 
HLPETEPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQENVANAYE 
QCRSALFYRL 
KNTDIIFRIH 
RLLSDSNDNP 
ALETGSLKNT 
YWILLTALFV 
WIVIASTTLF 
TIIGASLAWA 
TGDDVEYRAT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



_FTAASIVWQL 
LSSLVAQSTL 
TTLTYTPETY 
ALGSYLEAKA 
RGKHRHPRTA 
RLLEMQGQAC 
DIRHLRRLLD 
WQAIRPQLNL 
CQPNYTATKS 
FMTRTYKYSF 
AVSYLWPDWK 
RRRAHEHTAA 
AYRSEMHEEC 
LDTLRTHSSG 



r,rp mAMPFV LGIIAGGLVD 
rypr.T.PF TIAM TLMTFGFTIM 
MTTMP FMILC GTVLYSTAII 
DFFDPDEASW IGNRHIDLAM 
KMLRYYFAAQ DIHERISSAH 
RNTAQALRAS KDYVYSKRLG 
NLGSVDQQFR QLQHNGLQAE 
ESGVFRHAVR LSLVVAAACT 
RVRQR IAGTV LGVIVGS LVP 



STFFITIQAL 
YLTLERTAAL 
LSSTLSDMSS 
SPDFTAQFHL 
TQSHILLQQL 



TSLSLAGLDV 
AVCSNGAYLE 
EPAKFADSLQ 
AAEHTAHIFQ 
QLIARQLEPY 



ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 
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10 20 30 40 50 60 

or f 19a pep " MKTPPLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

| | | } | 1 | I I I t t 1 I t 1 1 I t t 1 1 1 I I I I 1 I I t I I 1 I I I M I I I K 1 1 1 I I I M M I 

orfl9-l MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl9a pep NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMTFGFTIMGAVGLKYRTFAFGALAVATY 
| | | : | | I I H I I I I : I I I I I I I i I I I I I I I I I I i I I M I : 111 I M I ! I I I 1 I I M M I I 
orf 19-1 NUTTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl9a pep TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQEKVANAYEALGSYLEAKA 
til Ml tilt I II II til tllttll I 111:111 1:1 tl I! 111:111 I I: I II : II II II 
orfl9-l TT LT YTPET YWLTN PFMI LCGTVLY STAI LLFQ I VLPHRPVQE SVANAYDALGG YLE AKA 

130 140 150 160 170 180 

190 200 210 220 230 24C 

orf 1 9a pep DFFDPDEAEWIGNRKIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
MMMII I I I I I I t 1 I 1 1 I I I I 1 1 > 1 1 i 1 1 I 1 I 1 1 I i I I 1 I 1 1 I I I t I I I 1 I I 1 1 1 1 I 
orf 19-1 DFFDPDEAAWIGNRKIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 1 9a pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

mm 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 19-1 DIHERISSAHVDYOEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 1 9a pep RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
| M | | | | | | | 1 | | I II I I : I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I i I 
orf 19-1 RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDOQFRQLQHNGLQAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19a pep ALETGSLKNTWQAIRPQLNIXSGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
|| | | : || | | | | | I ill II I I II I I II I I I I I II I I I I I 1 I I I I I I 1 I I I 1 I M I I I I I I I 
orf 19-1 ALET S S LKNTWQAI R PQLN LESGVFRHAVRLS LWAAACT I VE ALN LNLGYW I LLT AL FV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 1 9a pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
| 1 ] | | I I I I I 1 1 I t I I 1 t I I i t t I I I I I I 1 I 1 t I t I I 1 1 1 I I t 1 I I I I I 1 I I I I I I I t I I 
orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19a pep STFFITIQALTSLSLAGLOVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
| | | | | | | 1 I I I I! I I I I I I I I I ! I I I I I I t I I I I I I It I I I I M 1 I I I M I I I I I I I I I I 
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orfl9a.pep 
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orfl9-l STFF iTlQALTSI^UlGLD\^AAMP\milOTIlGASIAWAAVSYLWPDWKYLTlXRT^ 
490 500 510 520 530 54U 

550 560 570 580 590 600 

AVCSNGAYI£KITERD(SGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

orfl9-l ivCSNGAYIXKITERLKSGETGDDVEYRATRRRAHEHT 

610 620 630 640 650 660 

orf 1 9a . pep PGFTLLKTGYALTGYISAliGAYRSEMHEECSPDFTAQFHIAA^ 

° P P t | | | | | 1 | 1 I I I I I 1 1 I I I I I I 1 1 I I 1 t 1 1 I I I I I I I I II II Ml I I ' IM MM 

orf 19-1 PGETLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

610 620 630 640 650 660 

670 680 690 700 710 

orf 19a pep QTALDTLRGELDTI^THSSGTQSHILU2QLQLIARQLEPYYRAYRQIPHRQPQNAAX 
P P , ,, | | | | n | | | | | | | | | | | | II I I I 1 1 I M I M I M I I M I I I M I 

orf 19-1 QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 



20 670 680 690 700 710 

Homolo gy with a predicted ORF from A Gonorrhoeae 

ORF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from N. 
gonorrhoeae: 

IS nrfl9 oeo MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

25 orfl9.pe P hkt» , , | , , | M 1 1 1 1 1 1 1 1 M II M M 1 1 1 1 1 1 1 II M I 1 1 M I M I II I Mill 

MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 



orfl9ng 

or f 1 9 . pep NI ITTVALFTLSSLTAQSTLGTGLPFILAMTU4TXXFTILGAX 



103 



30 i | 1 : | | j 1 1 | I | I j I I I I M I M M I I II M I I I I II I II 

^ U orfl9ng N 1 1 ATVALFTLS SLTAQSTLGTGLPFILAMTUdTFGFT ILGAVGLKYRTFAFGAIAVA^ 120 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 1 10>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 

35 51 LDNRLTGRLK NIIATV ALFT LSSLTAQSTL GTGLPFILAM TLMTFGFTIL 

10 i GAVGLKYRTF AFGAIAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAII 

151 LFQIILPHRP VOESVAN AYE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

40 301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ED 1 1 1>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

45 xoi AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

50 351 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 

4 SI CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

55 eol AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

60 851 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

90i CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 
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10 



15 



20 



1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1*751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



GCGTcgacca 
Aacgaccgca 
caaaaaCAcc 
TATTCCGCCA 
ATCGTCgaag 
CCTTTTCGTC 
AACGCATCGC 
TACTTCACCC 
CACCCTGTTC 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATTGCCG 
CCGCATCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 



gcagtTCcgc 
tgggcgacaC 
tggcaggCAA 
TGCCGTCCGC 
cCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAGGCACTG 
TGCCCGTGCG 
GCGGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTT 
ACATGGGACC 
CTCGGCACCC 
CCAACAGCTC 
AC CG ACAAAT 



caactCCGAC 
CCGCATCGCC 
TCCGTCCGCa 
CTGTCCCTCG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATcgaC 
ACCTGTGGCC 
GCCGTATGCA 
AACCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCG 
CAACTCATCG 
TCCGCACAGG 



ACAgcgactC 
GCCCtcgaaa 
gctgaaCCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATTGTCA 
ATACAGTTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAGCGGCAC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACGGCAT 
CAGCAGCGGA 
CccgGCAACT 
CAGCCCCAAA 



CCCCGCcgaa 

ccggcagctT 

GAATCatgCG 

CGCCTGCACC 

TGCTGACCGC 

CGCGTGTACC 

GCTCGTCCCC 

TCGCCGGTAC 

TCCACCTTCT 

TTTGGACGTA 

GCGCATCCCT 

TACCTCACGC 

ATACCTCCAA 

ACATAGAATA 

CTCAGCAGCA 

CAGCCTGCAA 

GCTACATCTC 

AGCCCCGACT 

CATCTTCCAA 

TGGATACACT 

ACACAAAGCC 

CGAACCCTAC 

ACGCAGCCTG 



25 This corresponds to the amino acid sequence <SEQ ID 1 12; ORF19ng-l>: 
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35 



40 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MKTPLLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNLNLG 
YFTPSVETKL 
YAAMPVRIID 
KIAERLKTGE 
PGFTLLKTGY 
HLPDMGPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQESVANAYE 
QCRSALFYRL 
KNTDIIFRIR 
RLLSDGNDSP 
ALETGS FKNT 
YWILLTALFV 
WIVIAGTTLF 
TIIGASLAWA 
TGDDIEYRIT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL GEP KLAMPFV LGIIAGGLVD 
LSSLTAQSTL GTGLP FILAM TLMTFGFTIL 
TTLTYTPETY WLTNP FMILC GTVLYSTAI I 
ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 
RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 
RLLEMQGOAC RNTAQAIRSG KDYVYSKRLG 
DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 
WQAIRPQLNL ESCVFRHAVR LSLWAAACT 
CQPNYTATKS RVYQR IAGTV LGVIVGSLVP 
FMTRTYKYSF STFFITIQAL TSLSLAGLDV 
AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 
RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
LGTLRTRSSG TQSHILLQQL QLIARQLEPY 



ORF19ng-l and ORF19-1 show 95.5% identity in 716 aa overlap: 
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10 20 30 40 50 60 

orf 19-1. pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I I | | | | I | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I II I M M I ! I I I I I I I 

orfl9ng-l mktpllkpllitslpvfasvftaasivwqlgepklampfvlgiiagglvdldnrltgrlk 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 19-1 . pep niittvalftlssltaqstlgtglpfilamtlmtfgftilgavglkyrtfafgalavaty 

I i t : I I I I 1 I I It I I I I I I I I I I I I I I 1 I I I I I M Ml I I t I I I I I I I 1 I I II I ! I I I I I 

orfl9ng-l niiatvalftlssltaqstlgtglpfiiamtlmtfgftilgavglkyrtfafgalavaty 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 1 9-1 . pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQ'lVLPHRPVQESVANAYDALGGYLEAKA 
| I I I I I I I I I 1 I ! 11 I I I I II M I I I I I I : I I I I : I I I I I I I I I I M i I : I I M I I I I I t 
orfl9ng-l TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQESVANAYEALGGYLEAKA 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 19-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r i M 1 1 1 1 il 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 

orfl9ng-l DFFDP DEAAW I GNRHI DLAM SNTG V I T AFNQCRS ALFYRLRGKHRH PRTAKMLR YY FAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19-1. pep DIHERI SSAHVDYQEMSEKFKNTDI I FRIHRLI^MQGQACRNTAQAIJ^AS KDYVYSKRLG 
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orf 19ng-l 

orfl9-l.pep 
orf 19ng-l 
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orf 19-1. pep 
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orfl9-l.pep 
orfl9ng-l 

orf 19-1. pep 
orf 19ng-l 
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orf 19ng-l 
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1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | | 1 1 1 1 1 1 1 1 : 1 M 1 1 1 1 1 1 II I I M I ' I 1 1 1 

DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 2B0 290 300 

310 320 330 340 350 3 *<> 

RAIEGCRQSLRU,SDSNDSPDIRHLRR^ DNL GSVDQQFRQLQHNGLQAENDRMGDTRIA 

1 1 I I M 1 1 III | I 1 1 : I 1 I I I I I 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 : I I IMIMM II 

RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVOQQFRQLRHSDSPAENDRMGDTRIA 
310 320 330 340 350 360 

370 380 390 400 410 420 

ALET S S LKNTWQAI RPQLNLESG VFRHAVRLS LWAAACT I VEALNLNLG YW ILLTALFV 
| I I I : I : I 1 1 1 1 1 I I II I I 1 1 I IIMIMIMIMHIIIIlMIIIMMIinilN 
aixtgsfkntwqairpqi^ij:scvfrhavrlslwaaactivealnlnlgywilltalfv 

370 380 390 400 410 420 

430 440 450 460 470 480 

cqpnytatksrvrqriagtvlgvivgslvpyftpsvetklwiviasttlffmtrtykvsf 

I I 1 1 t I 1 1 t i I I llllllllllllllMKMilMlMlilll'NIIIIIIIIilll 

cqpnytatksrvyqriagtvlgvivgslvpyftpsvetklwiviagttlffmtrtykysf 

430 440 450 460 470 480 

490 500 510 520 530 540 

STFFIT IQALTSLSLAGLDVYAAMPVRI I DTI IGASLAWAAVSYLWPDWKYLTLERTAAL 

iiiiiiumimMiiiiiiiiiiiiinmiiimiiMiiiimimiiii 

STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
490 500 510 520 530 540 

550 560 570 580 590 600 

AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

||||:|:||:||:INI:l IIMI:IM I I I I I 1 II I I I 1 II I I I M I I I I I I I I I I I I 
AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
550 560 570 580 590 600 

610 620 630 640 650 660 

PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHIAAEHTAHIFQHLPETEPDDF 

I I I I I I I I I I M I M I I I II H I II I I I I M I I M I I I I I I M I I I II I I t I I : MM 
PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 
610 620 630 640 650 660 

670 680 690 700 710 

QTALDTLRGELDTLRTHSSGTQSRILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
tilllllim I I I I : I I I I 1 t I I M I I I I M II I II M I I I M I I I 1 I I I I M M 
QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 



50 



55 



60 



65 



In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

spl 033369 1 YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl I PIDl ell54438 
(AJ002423) hypothetical protein [Neisseria gonorrh) Length « 417 

Score - 1512 (705.6 bits), Expect « 5.3e-203, P » 5.3e-203 

Identities - 301/326 (92%), Positives - 306/326 (93%) 

Query 307 RQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 366 

RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 
SbjCt: 1 RQSLRLLSDGNDSXDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 60 

Query 367 FKNTWQAIRPQLNLESCVFRHAVRLSLWAAACT I VEALNLNLG YW I LLTALFVCQPNYT 426 

FKNTWQAIRPQLNLES VFRHAVRLS LWAAACT I VEALNLNLGYW I LLT LFVCQPNYT 
Sbjct: 61 FKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTRLFVCQPNYT 120 

Query 427 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 486 

ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 
SbjCt: 121 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 180 

Query 487 IQALTSLSLAGLDVYAAMPVRI I DTI IGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 546 

IQALTSLSLAGLDVYAAMPVRI IDTI IGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 
Sbjct- 181 IQALTSLSLAGLDVYAAMPVRI IDTI I G7VSLAWAAVSYLWPDWKYLTLERTAALAVCSSG 240 



BNSOOQD. <WO Q824S7BA2J_> 



WO 99/24578 PCT/IB98/01665 

Query: 547 TYLQKlAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGrTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

5 Query: 607 KTGYALTGYI SALGAYRSEMHEECS P 632 

K ALTGYISALG ++ + +P 
Sbjct: 301 KPATALTGY I SALGHTAAKCTKNAAP 326 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
1 0 with the YHFK protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
113>: 

15 i ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

20 251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

401 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

4 51 ACTCAATTCT TATCATAAGT TCGGCATTCC GGCGTTTACG CCAC.GTTTC 

25 501 TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 

551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC CAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

701 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

30 751 CACGATTTTc GCGTCTTATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

801 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 

851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 

901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 

951 TGACGCTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

35 1001 GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAmGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

1201 CTTTAyCGGC CCACTrrAAC rCasTCGGAC TTTCGCTTGC CATCGGTCTG 

40 1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

1301 TATTTACCAA CCTGG . CAAG GGTTGGGCAG CGTTCTT . AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence <SEQ ID 1 14; 0RF2O: 

1 MNMLGALAKV GS LTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

45 51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAOD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWliAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

50 301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVA7LFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

401 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 

55 1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTT TT G GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 
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101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

\ll crrcGccGCG tgtttgcgga gggggcgttt gcccaagcgt ttgtgccgat 
201 tttggcggaa tacaaggaaa cgcgttcaaa agaggcggcg ga^gctttta 
251 tccgccatgt ggcggggatg ctgtcgtttg tactggttat cgttaccgcg 

5 301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG ^CCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

401 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

451 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACCTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

10 551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

15 801 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 

70 1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGCT TAATCATGAT 

U 01 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

75 1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

135 1 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

1401 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 
1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

30 This corresponds to the amino acid sequence <SEQ ID 1 16; ORF20-1>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHK FGIP AFTPT FLNVS FIVFALFFVP YF DPPVTALA WAVFVGGILQ 

7< 201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVIN 

J 251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG L RLCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FTKPLKHVGL S LAIGLGAC1 NAGLLFYL LR RHGIYQPGKG WAAFLAKMLL 

40 451 SLAVMCGGL W AAQAYLPFEW AHAGGMRKAG OLCILIAVGG GLYFASLAAL 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the MviN virulence factor o f S. tvphimurium (accession number P37169) 

ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

45 Orf20 1 MNMLGALAKVG S LTMVSRVLG FVRDTV I ARAFGAGMAT DAFFV AFKLPN LLRRVFAEG AF 60 

urrzu i rt t%r„ oM eDUTr.r un FftAGMATDAFFVAFKLPNLLRR+FAEGAF 



0t MN+L +LA V S+TM SRVLGF RD ++AR FGAGMAT D AFFV AFKL PNLLRR+ FAEGAF 

MviN 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Orf20 61 AQAFVPIIAEYKETRSKEAXEAFIRHVAGMLSFVLVI\rTAU3ILAAPWVIYVSAPSFAQD 120 
50 +QAFVPI LAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 

MviN 74 S QAFV P I LAE YKS KQG EEAT R I FVAYV S G LLT LALA WTV AGMLAA P WV IMVT APG FADT 133 

Orf20 121 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 
ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
55 MviN 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGG I LQLX FQLPWLAKLG FLKLPKLS FKDAAVNRVMKQMAPAI LGV 240 
YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 
YFNPPVLAUVWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRVVKQMGPAILGV 253 



60 



MviN 194 



Orf20 241 S VAQVSLVINT I FAS YLQSGSVSWMYYADRMMELPSGVLGAALGT I LLPTLSKHSANQDT 300 

SV+Q+SL+INTIFAS+L SG SVS WMYYADR+ME PSGVLG ALGTILLP+LSK A+ + 
MviN 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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10 



Orf20 


301 


MviN 


314 


Orf20 


361 


MviN 


374 


Orf20 


421 


MviN 


434 



+++ L+DWGLRLC LL LP+AV L +L+ PL +LF V FT FDA MTQ ALIAYS G 



LIGLI++KVIAPGFY+RQ+I PVKIAI TLI QIjMNL F 



NA LL++ LR+ 1+ P G 



C+ 



Homology with a predicted ORF from N.menineitidis fstrain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) firom strain A of N. 



15 meningitidis: 
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orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 



10 20 30 40 50 60 

MNMLG ALAKVG S LTMV S R VLG FVR DTVI ARAFGAGMAT DA F FV AFKLPN LLRR V FAE GAF 
i I I t I I t : t I I I i M I I I I I I I i I M I I I I I 1 t I I I I I t i t i I I I I I I I I I I M I I 1 I I i 
MNMLG ALVKVG S LTMV S R V LG FVR DTV I ARAFGAGMAT DA F FV AFKL PN LLRRV FAE G A F 

10 20 30 40 50 60 

70 80 90 100 110 120 

AOAFVPILAEYKETRSKEAXEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPSFAQD 

I | | | | I | I || I I I I I I I I 1:1 I I I I I I I I I I I I I I I I I II U I I I I I I I M I I ! I: I I: I 
AOAFVPILAEYKETRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPGFAKD 
70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFOLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFGIPAFTPX FLNVSFIVFALFFVP 

I | | | || | I I I I I I I I I I I I II I I I I I I I I I I I I I I I I : I I I I I I: I I U I M I I I I I I I I 
ADKFQLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFSIPAFTPT FLNVSFIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

Y FD P P VT AXAW AV FV GG I LQLX FQL PW LAKLG FLKLPKLS FKD AAVNR VMKQ MA PA I LGV 
| I I I I I I I I 1 I I I I i I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I I I I I I I I I I I I 
YFDPP VTA1AWAVFVGG I LQLG FQL PWLAKLG FLKLPKLS FKDAAVNRVMKQ MAPAI LGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

S VAQVS LV I NT I FAS YLQSGSVSWMYYADRMMELPSGVLGAALGT I LLPTLSKHSANQDT 
I ) I I : t II M II I I I I I I I I I I I I M I I I I I I I I I : I I I I 1 I I I I I I I I I 1 1 I I I I! I I I 
SVAQISLVI NTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

EOFSALLPWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQH ALIAYSFG 
I I | || I I I I I I I I I I 1 I I 11 I I I : I I I I I I I I I I I I 1 I I I I I I I I I I I I I M I I I I I I 
EOFSALLDWGLR XCMLLTLPAAVGMAVLS FPLVATLFMYREFTLFDAQMTQH ALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

LIGLIMIKVL APGFYARQNIXXPV KIAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 

I I I i I I I I I I i I I I I 1 I t I I : II I II I I I I I I : II I ! I I III : I I I I II I I t I f t 
LIGLIMIKVIAPGFYARQNIKTPVKIAIFTLICTQI^NLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 

NAGLLFYL LRRHG I YQPXQGLGS VLXQKCCS RS PX 

II I I I I I II I I I I I I I I : I :: I : 

NAGU.FYL LRRHGIYQPGKGWA AFLAKMLLSLAVMGGGL YAAQIWLPFDWAHAGGMQKAA 
430 440 450 460 470 480 
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The complete length ORF20a nucleotide sequence <SEQ ID 1 17> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 
101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 



BMSDOaD: <WO 992*57BA2J_> 
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151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



CTTCGCCGCG 
TTTGGCGGAA 
TCCGCCATGT 
CTGGGCATAC 
TGCCAAAGAT 
CGTTTCCTTA 
CTCAATTCCT 
GAACGTGTCG 
CTCCCGTTAC 
CTCGGCTTCC 
CAAACTGAGT 
CGCCTGCGAT 
ACGATTTTCG 
CGCCGACCGC 
GTACGATTTT 
GAACAGTTTT 
GACGCTGCCG 
CAACCTTGTT 
CAACACGCGC 
TAAAGTGTTG 
TCAAAATCGC 
TTTATCGGCC 
CGCGTGTATC 
TTTACCAACC 
TCGCTCGCCG 
GTTCGACTGG 
TCCTGATTGC 
GGCTTCCGTC 



TGTTTGCGGA 
TATAAGGAAA 
GGCGGGGATG 
TTGCCGCGCC 
GCCGACAAAT 
TATCTTATTG 
ATCATAAATT 
TTTATCGTAT 
CGCGCTGGCT 
AACTGCCCTG 
TTCAAAGATG 
TTTGGGCGTG 
CGTCTTATCT 
ATGATGGAAC 
GCTGCCGACT 
CCGCCCTGCT 
GCGGCGGTCG 
TATGTACCGA 
TGATTGCCTA 
GCGCCCGGCT 
CATCTTCACG 
CACTGAAACA 
AATGCCGGAT 
TGGCAAGGGT 
TGATGGGAGG 
GCACACGCCG 
CGTCGGCGGC 
CGCGCCATTT 



-118- 

GGGGGCGTTT 
CGCGTTCTAA 
CTGTCGTTTG 
XTGGGTGATT 
TTCAGCTCTC 
ATTTCACTTT 
CAGCATTCCT 
TCGCGCTGTT 
TGGGCGGTTT 
GCTGGCGAAA 
CGGCGGTCAA 
AGCGTGGCGC 
GCAATCGGGC 
TGCCCGGCGG 
TTGTCCAAAC 
CGACTGGGGT 
GAATGGCGGT 
GAATTCACGC 
TTCTTTCGGT 
TTTATGCGCG 
CTCATTTGCA 
CGTCGGACTT 
TGTTGTTTTA 
TGGGCAGCGT 
CGGCCTGTAT 
GCGGAATGCA 
GGACTGTATT 
CAAACGCGTG 



GCCCAAGCGT 
AGAGGCGACG 
TACTGGTCAT 
TATGTTTCCG 
TATCGATTTG 
CCTCTTTTGT 
GCGTTTACGC 
TTTCGTGCCG 
TTGTCGGCGG 
CTGGGTTTTT 
CCGCGTGATG 
AGATTTCTTT 
AGCGTTTCAT 
CGTGCTGGGG 
ACTCGGCAAA 
TTGCGCNTGT 
GTTGTCGTTC 
TGTTTGACGC 
TTAATCGGTT 
GCAAAACATC 
CGCAGTTGAT 
TCGCTTGCCA 
CCTGTTGCGC 
TCTTGGCAAA 
GCCGCCCAAA 
AAAGGCCGCC 
TCGCATCACT 
GAAAGCTGA 



TTGTGCCGAT 
GAGGCTTTTA 
CGTTACCGCG 
CACCCGGTTT 
CTGCGGATTA 
CGGCTCGGTA 
CCACGTTCCT 
TATTTCGATC 
CATTTTGCAA 
TGAAACTGCC 
AAACAGATGG 
GGTGATCAAC 
GGATGTATTA 
GCGGCACTCG 
CCAAGATACG 
GCATGCTGCT 
CCGCTGGTGG 
GCAGATGACG 
TAATCATGAT 
AAAACGCCCG 
GAACCTTGCC 
TCGGTCTGGG 
AGACACGGTA 
AATGCTGCTC 
TCTGGCTGCC 
CGGCTCTTCA 
GGCGGCTTTG 



This encodes a protein having amino acid sequence <SEQ ID 1 1 8>: 
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1 MNMLGALVKV 

51 LRRVFAEGAF 

101 LGILAA PWVI 

151 LNSYHKFSIP 

201 LGFQLPWLAK 

251 TIFASYLQSG 

301 EQFSALLDWG 

351 QH ALIAYSFG 

401 FI GPLKHVGL 

451 SLAVMGGGL Y 

501 GFRPRHFKRV 



GSLTMVSRVL 
AQAFVPILAE 
YVSAPGFAKD 
AFT PT PLOTS 



LGFLKLPKLS 
SVSWMYYADR 
LRXCMLLTLP 



LIGLIMIKVL 



SLAIGLGACI 



AAQIWLPFDW 
ES* 



GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 
YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 
ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 
FIVFALFFVP YF DPP VTALA WAVFVGGILQ 
FKDAAVNRVM K OMAPAILGV SVAQISLVI N 
MMELPGGVLG AALGTILLPT LSKHSANQDT 
AAVGMAVLS F PLVATLFMYR EFTLFDAQMT 
APGFYARQNI KTPVK IAIFT LICTQLMNLA 
NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 
AHAGGMQKAA R LFILIAVGG GLYFASLAA L 



ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 
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orf20a.pep 
orf20-l 

orf20a.pep 
orf20-l 

or f 20a. pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 



10 20 30 40 50 60 

MNM LGALVKV G S LTMV S R VLG FVR DTV I ARA FGAGMAT DAFFVAFKL PN LL.RR V FAEGAF 
, | , | 1 1| : | 1 | I | I I I I I I I ! II I II I I I H I I II I I t I I I I I I I I I I M I I I I MM II 
M^LGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 

70 80 90 100 110 120 

AQAFVPI LAEYKETRSKEATEAFI RHVAQtLS FVLVI VTALGI LAAPWV I YV SAPG FAKD 
I || | | | | I I I Hi I I I II ! : M I I II II I I III It I M II II I I I I Ml Ml I HI II: I 
AQAFVPIU^YKETRSKEAAEAFIRHVAGMLSFVLVI^ 

130 140 150 160 170 180 

ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFSIPAFTPTFLNVSFIVFALFFVP 

, 1 1 I U 1 1 | I I I M 1 1 I I M I I I I M I I I M t 1 1 I I I : I M I I I I I M I I I I I I I MM I 
ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVS FIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPPVTAIAWAVFVGGI WLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAI LGV 
I j I I | | 1 1 | I I I I 1 1 1 I M I I I I I M I M I I I 1 I I M I I I I I M I I I I I I I I I M I I I M 
Y F D P PVT ALAW A VFVGG I LQLG FQL P W LAKLG FLKL PKLS FKD AAVNR VMKQMA P AI LGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQISLVINTI FAS YLQSGSVSWMYYADRMMELPGGVI/j AALGTILLPT LSKHSANQDT 



8NSOOCIO. <WO 992*S7BA2_I . > 
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I 1 1 1 1 1 1 1 I 1 I 1 I I <1 1 > * 1 1 1 1 1 f 1 1 1 I I I I 1 1 I : I I 1 1 | 1 1 1 1 i t | I I I I I I 1 1 1 1 1 1 
orf 20-1 svaOVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 20a . pep eqfsalldwglrxcmlltlpaavgmavlsfplvatlfmyreftlfdaqmtqhaliaysfg 
MINIMI IN MMI!!llll:ltllllM!ll!IIMIIt!illMIMIIIMII 

orf 20-1 eqfsaLLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
310 320 330 340 350 . 360 

370 380 390 400 410 420 

orf 20a. pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I | | i I | 1 I I I M I I I I I I I I t I I I I I I I I I I M I M I I | || | || I 1 I I I I M I t It I I I 
orf 20-1 LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 20a pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMGGGLYAAQIWLPFDWAHAGGMQKAA 

1 ^ I I I ! 1 i I I t t I I I t 1 I I I I I I I I I I 1 I I I t I 1 I t 1 111:111 

orf 20-1 NAGLLFYLIARHGIYOPGKGWAAFLW<MLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

430 440 450 460 470 480 

490 500 510 

orf 20a . pep RLFII-IAVGGGLYFASLAALGFRPRHFKRVESX 
:| I SI I I I I I t I M i I I I I I I I M I I I I I I : I 
orf20-l QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 



Homology with a predicted ORF from N gonorrhoeae 
30 ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from N. 
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gonorrhoeae: 

orf 20. pep 
orf20ng 
orf 20. pep 
orf20ng 
orf 20. pep 
orf20ng 
orf 20. pep 
orf20ng 
orf 20. pep 
orf20ng 
orf 20 .pep 
orf20ng 
orf 20 .pep 
orf20ng 
orf 20. pep 
orf20ng 



MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

| I I } | |j | ! I I j 1 I I I I i I I ! I II I I I I I II I 1 I I I I I i I I I Mt I I 1 I I I t I I I III I I 
MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 



60 



60 



AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSEVLVIVTALGILAAPWVIYVSAPSFAQD 120 

t | | t I I I I M I I I M I I 1 l: I I I I I I M I ! I H I I : : I I I II M I I i I I I M M l: i: : I 
AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 



ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 

| | | | | | | |: | I I II I I I II I I H I II I I I : I I I I I I I I I I I I I I : I M : I II I II I I I I I 
ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLKISFIVFALFFVP 



180 



180 



Y FD P P VT AXAW A V FVGG I LQ LX FQL PW LAKLG FLKL PKL S FKD AAVN R VMKQMAPA I LG V 240 
I | | I I I I I | | I I I I I I I I I 1 I I I I I I I II I I I II II I : I I I I I t I I M I I II I I I I I I 

YFDPPVTALAWAVFVGGILQLGFQLPWIAIOGFLKLPKLNFKDAAVNRVMKQMAPAILGV 240 

SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 
1 | | | r I I 1 | I 1 f t 1 I I I 1 1 1 I 1 1 I I I 1 i 1 t S I I I t r | i I I I I t I I 1 I 1 f 1 1 I I I I I 1 I t I 

SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 300 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 360 
I I f I f I i t 1 ! I I 1 I 1 I 1 t I I I I r | I t I I I I I I i i I I I 1 I ! I I I I I I I I I I I I I I I I I I I 

EQFSALLDWGLRLCMLLTLPAAAGLAVLS FPLVATLFMYREFTLFDAQMTQHALI AYS FG 360 



LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXOLMNLXFXGPLXXIGLSLAIGLGACI 

I 1 f 1 t I 1 1 1 1 I MINIM M M I I II I I M M M M I Ml INN NIMH 
LIGLIMIKVLASGFYARQNIKTPVKIAirTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

NAGLLFYLLRRHGI YQPXQGLGSVLXQKCCSRSP 454 

II I II 1:1:1:1 HIM MM: MM Mil 
NAGLLFFLFRKHGIYRPGQGLGQPSWRKCCSRSP 454 



420 



420 



An ORF20ng nucleotide sequence <SEQ ID 1 1 9> was predicted to encode a protein having amino 
acid sequence <SEQ ID 120>: 



BNSOOCID: <WO 992457aX2.l_> 



WO 99/2457S 



-120- 
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1 MNMLGALAKV GSLTMVSRVL GFVRDTV^ AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAGM LSFVLIWTA 

101 LGILAAFWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAAGLAVLSF PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FIGPLKHAGL SLAIGLGACI KAGLLFFLFR KHGIYRPGQG LGQPSWRKCC 

451 SRSP* 

Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 121>: 

1 ATGAATATGC TTGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGAcg gAGGCTTTTA 

251 TCCGCCACGt tgcgggAatg CTGTCGTTTG TGCTGATcgt cGttacCGCG 

301 CTGGGCATAC TTGCCGCgcc tTGGGTGATT TATGTTtccg CgcccGGCTT 

351 TACCAAAGAC GCGGACAAGT TCCAACTTTC CATCAGCCTG CTGCGGATTA 

401 CGTTTCCTTA TATATTATTG ATTTCTTTGT CTTCTTTTGT CGGCTCGATA 

4 51 CTCAATTCCT ACCATAAGTT CGGCATTCCC GCGTTTACGC CCACGTTTTT 

501 AAACATCTCT TTTATCGTAT TCGCACTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTTT TTGTCGGCGG TATTTTGCAG 

601 CTCGGTTTCC AACTGCCGTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAAT TTCAAAGATG CGGCGGTCAA CCGCGTCATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG agcgTGGCGC AAATTTCTTT GgttATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTatta 

801 cgCCGACCGC ATGATGGAGc tgcgccGGGG CGTGCTGGGG GCTGCACTCG 

851 GTACAATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

) 901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

. 951 GACGCTGCCG GCGGCGGccg GACTGGCGGT ATTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGA GAATTCACGC TGTTTGACGC ACAAATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATTATGAT 

1101 TAAAGTGTTG GCATCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

^5 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTCGCC 

12 01 TTTATCGGTC CGTTGAAACA CGCCGGGCTT TCGCTCGCCA TCGGCCTGGG 

1251 CGCGTGCATC AACGCCGGAT TGTTGTTCTT CCTGTTGCGC AAACACGGTA 

1301 TTTACCGGCC cggcaggggt tgggcggcgt TCTTGGCGAA AATGCTGCTC 

1351 GCGCTCGCCG TGATGTGCGG CGGACTGTGG GCGGCGCAGG CTTGCCTGCC 

40 1401 GTTCGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCTCT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

45 51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHK FGIP AFTPT FLNIS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM K QMAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

CO 301 EQFSALLDWG L RLCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

4 01 FTflPLKHAGL S LAIGLGACI NAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 

4 51 ALAVMCGGL W AAQACLPFEW AHAGGMRKAG O LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

55 ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 

orf 20-1. pep 



10 20 30 40 50 60 

MNMIXSAIJUCVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I I 1 I 1 1 t I I 1 1 I I 1 I I I 1 I t I I I I I I I I I I t I I t I t t I I I 1 I I I 1 I t 1 I 1 I 1 I I 1 I I I I 1 
orf 20nq- 1 MNMLGALAKVGSLTMVSRVLG FVRDTVIARAFGAGMAT DAFFVAFKLPNLLRRVFAEGAF 

50 ^ 10 20 30 40 50 60 

70 80 90 100 110 120 

O rf20-1 D6D AOAFVP I LAE YKETRSKEAAE AFI RH VAGMLS FVLVI VTALG I LAAPWVI YVS APGFAQD 
' P H I I I I 1 1 | 1 1 II 1 1 I 1 1 1 1 I : I I I I I I I M I I 1 1 1 1 :: I I M I I 1 1 I I M f I I I 1 1 I I :: I 
65 orf20ng-l AQAFVPILAEYKETRSKEATE1AFIRHVAGMLSFVLIVVTALGI LAAPWVI YVSAPGFTKD 



WO 99/24578 
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orf 20-1. pep 
orf20ng-l 

orf 20-1. pep 
orf20ng-l 

orf 20-1. pep 
orf20ng-l 

orf 20-1. pep 
orf20ng-l 

orf20-l.pep 
orf20ng-l 

orf 20-1. pep 
orf20ng-l 

orf20-l .pep 
orf20ng-l 



130 I 40 150 160 

ADKFQLSIDLLRITFPYlLLlSLSSrVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 

|||||||t:|||MlllllM!iM)llt:Mll||l>MlllMIM:llllillltll 
ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 

130 140 150 160 HO 180 

190 200 210 220 230 240 

v FDPPVTALAWAVFVGG I LQLGFQLPWLAKLGFLKLPKLS FKDAAVNRVMKQMAPAI LGV 
I I I I I I I I I I I j I I I I I I I I I I I I 1 I I 1 i M I 1 I 1 I I I I : I I I I M I I I I I ! I 1 I 11 I I I 
Y FD PPVT ALAWAV FVGG I LQLG FQLPWLAKLG FLKL PKLN FKDAAVNRVMKQMAPAI LGV 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

I I | I z t I I I 1 I I I I I 1 1 I I t t 1 1 1 I I 1 t I 1 I 1 ! I HllllllMlillMMMIIM 
SVAQISLVINTIFASYLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
HIMItlllllllMMMII:llilMlillMllMtlltiMIMMII!llilii 
EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

L T GLIMIIWIAPGE^ARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

I I I t f 1 i I I I I I I I I I 1 I I I I t I 1 I I 1 I 1 ! I I t I 1 I I t 1 1 I I 1 1 I I = I I I 1 I I I 1 i I I I 
LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 
370 380 390 400 410 420 

43O 440 450 460 470 480 

N AG LLF Y L LRRH G I Y Q PGKGW AAFLAKMLL S LAVMCGGLW AAQAYL P FE W AHAGGMRKAG 
I I || I: ! II 1: ll:tH II! II lll:IIIIM I 1! II II I I I I I I 1 1 I I 1 1 I I I 

NAGLLFFLLRKHGIYRPGRGWAAFLAKMLLALAVMCGGLWAAQACLPFEWAHAGGMRKAG 

43O 440 450 460 470 480 

490 500 510 

QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 
I t I I 1 1 I 1 1 I I I t 1 I 1 1 I I 1 1 1 1 1 1 1 1 M t I r I 
QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 

490 500 510 
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In addition, ORF20ng-l shows significant homology with a virulence factor of S.typhimurium: 

SDIP37169IMVIN SALTY VIRULENCE FACTOR MVIN pir| (S40271 mviN protein - Salmonella 
typhimurium gl 1 438252 (Z26133) mviB gene product [Salmonella typhimurium} 
gnl|PID|dl005521 (D25292) ORF2 [Salmonella typhimurium] Length = 524 

Score = 1573 (750.1 bits), Expect = l.le-220, Sum P(2) = l.le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 

Que-y " MNMLGALAKVG SLTMVSRVLGFVRDTV I ARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
Sbjct: 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Querv 61 AQAFVPI LAEYKETRSKEATEAFI RKV AGMLS FVLI WTALG ILAAPWV I YVS APGFTKD 120 

+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 
Sbjct: 7 4 SQAFVPIIAEYKSKQGEEATRIFVAYVSGLLTLAIAVVTVAGMIAAPWVIMVTAPGFADT 133 

Query 121 ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 
Sbjct: 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Query* 18** Y P PVT ALAWAV FVGG I LQLGFQL PWLAKLG FLKL PKLN E*KD AAVKRVMKQMAPAI LGV 240 

YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 
Sbjct: 194 YFVPPVIAIAWAWVGGVLQLVYQLPYUCKIGMLVLPRINFRDTGAMRVVKQMGPAILGV 253 

Query 241 SVAQISLVINTIFASYLQSGSVSWMYYADR^LRRGVIX^AI^ILLPTLSKHSANQDT 300 

SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
Sbjct* 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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EQFS ALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYRE FTLFDAQMTQHALI AYS FG 360 
+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
DEYCR1^DWGLIU.CFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALI AYSVG 37 3 

LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 
LIGLI++KVLA GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
LIGLI WKVLAPGFYSRQDIKTPVKI AI VTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

_ NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 467 

10 ' NA LL++ LRK 1+ P GW VM L+ +P 

SLLYWOLRKQNirrPQPGWMWFLMRLIISVLVMAAVLFGVLHIMP 480 

.4 bits), Expect - l.le-220, Sum P(2) - l.le-220 



Query: 


301 


Sbjct: 


314 


Query: 


361 


Sbjct: 


374 


Query: 


421 


Sbjct: 


434 


Score 


= 70 



15 



Query 469 EWAHAGGMRKAGQLCI LI AVGGGLYFAS LAALGFRPRHFKR 509 

£W+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAWIAGIAAYFAALAVLGFKVKEFVR 521 



20 Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N.meningitidis and N gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 15 

The following partial DNA sequence was identified in N.meningitidis <SEQ ED 123>: 

25 l atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

30 251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG. . 

35 This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 
101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
151 VNAMDTNP. . 

40 Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

45 201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

50 4 51 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

55 701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 
1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CAT CACGCGT 
1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 
1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 
5 1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 
1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 
1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

10 i MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPKPAGL SGTHIKFIEP VGANKTVWTI NYQDVITIGR 

15 251 LFATGRLKTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 127>: 

20 1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

25 251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

30 501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

35 751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

40 1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

45 1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

50 101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 -LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

55 351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 

10 20 30 40 50 60 

60 orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

I | || | M | | I II II I I I I : : I M I : I I I I I I 1 I I I I 1 I I I I M I 1 I It II I I I I I I I t I 
orf 22a M I K I KKGLN L P IAGRPEQV I YDG PV I TE V ALLGEE YAQ1R PXMKVKEGDAVKKGQVL FE D 
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70 80 90 100 

KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II | I 1 I I I 1 | : | I I 1 1 1 1 1 1 1 1 I I I t I IIIMtl U t M I I I M I I M I M M I 
KKXPGWFTAPVSGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYAPEALANLSGXEXXX 

70 80 9° 100 HO 120 

130 140 1^0 

NLIQSGLWTALRTRPFSKIPAVDAEPFAI FVNAMDTNP 
I 1 1 1 1 1 1 1 I! II : I 1 1 1 1 1 1 1 1 1 1 1 I M I I I I I I I 1 1 1 

KLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
130 140 150 160 170 180 



The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 

orf22a.pep 
orf22-l 



10 20 30 40 50 60 

MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 

| | | | | | | I | 1 | | | | I I I I :: I I I I : I I I It I I I ) I I t M I I I I I I I I I I I II M M I M 
MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 
11 | | | I I II I : I I I I I 1 I I I I H I I I I M I I I I I I I I I m I m I I I I M I M I 
KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 BO 90 100 110 120 

130 140 150 160 170 180 

NLIOSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

MlimiMll:limmHMMIIMIimiMIIII:|:IIM 11:1 U 
NLIQSGLWTALRTRPFSKIPAVDAEPFAI FVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 

190 200 210 220 230 240 

L S RLTE RKI H VCKAAG ADV P S EN AAN I ET HE FGG PH PAG LSGTH I H FI E PVG ANKT VWT I 
1 || | ! | | | || 1 I I I III I 111 I I I M I I I I I I I M II I I i I I I I M I I II II I I M II I I 
LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADNRVI 

| | | | | | : || | ! | | | | I I I III I I I I I M II I I I I I I I I M II I I I I III I II II : I I I I I 
NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

SGSVLNGAITQGAHDYLGRYHNQI SVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

iiiitiiiiiiiiiiiiiiiiiiiMtiiiiMiiiiMMi [milium 

SGSVLNGAITQGAHDYLGRYHNQI SVIEEGRSKELFGWVAPQPDKYS I TRTTU3HFLKNK 
310 320 330 340 350 360 

370 380 390 400 410 420 

LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

nicm ii iiiiiimm i u u i u u u 1 1 u i u 1 1 u 1 1 u u u u i u u 

LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
370 380 390 400 410 420 
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LCSFVCPGKYEYGPLLRKVLETIEKEGX 
430 440 
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Further work identified a partial gene sequence <SEQ ID 129> from ^gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
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151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

5 Further work identified complete gonococcal gene <SEQ ID 131>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAATCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

1C1 AAGAATATGT CGGCATGCGC CCCTCGATGA AAATCAAGGA AGGTGAAGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 

10 201 ATTTACTGCG CCGGCTTCAG GCAAAATCGC CGCTATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGTACC TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 

351 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 

15 4 51 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 

501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 

551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 

651 TGCCGGCTTG AGTGGCACGC ACATTCATTT CATCGAGCCA GTCGGCGCGA 

20 701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 

751 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 

801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 

851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 

25 951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

30 1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

35 51 VKKGQVLFED KKKPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

40 301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 



45 overlap with ORF22ng: 



50 



55 



orf 22. pep 

orf22ng 

orf22.pep 

orf22ng 

orf22.pep 

orf 22ng 



M I K I KKG LN L P I AGR P EQ A V Y DG P AI TE V ALLG EE Y AGMR P SKKVKEG D A VKKGQ V LFE D 
I I I I I I I i I I I I ! I I M I : : i I I II I I 1 I i II I I II : ) 1 I i I I I : ! I i : I I I I il H 1 It 
MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 



60 



60 



120 



KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 
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NLIQSGLWTALRTRPFSKI PAVDAEPFAI FVNAMDTNP 
I | | I | | | | I II If I I I II I I I I I I I I I I II I I I I I I I I 

NLIQSGLWTALRTRPFSKI PAVDAEPFAI FVNAMDTNPLAADPTVIIKEIAAEDFKRGLLV 



158 



180 



60 



The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 

10 20 30 40 50 60 

orf 22-1. pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
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15 



20 



25 



30 



35 



40 



45 
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orf22ng-l 

orf 22-1. pep 
orf22ng-l 

orf 22-1. pep 
orf22ng-l 

orf 22-1. pep 
orf22ng-l 

orf 22-1. pep 
orf22ng-l 

orf 22-1. pep 
orf22ng-l 

orf 22-1. pep 
orf 22ng-l 
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20 30 40 50 60 



10 



110 



120 



70 80 90 100 

KKNPGWrrAPASGKIAMHRGEKRVLQSWIAVEGNDEIEFERYAPE^LSGK^ 

^PGW^AP^ 

80 90 100 I 10 120 



70 



170 



180 



130 140 ISO 160 

NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTOPLAADPTVIIKEAAEDFKRGLLV 

NLIQSGLWT^RTRPFSKIPAVDAEPFAI FVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
140 150 160 1*70 180 



130 



190 200 210 220 230 240 

LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

1 1 ii 1 1 1 1 1 1 1 1 1 1 it 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i II 1 1 n 1 1 1 1 1 ii M 1 1 1 1 1 1 1 

LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
200 210 220 230 240 



190 



290 



300 



250 260 270 280 

NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

i m i|:|Ml|:|UIMMI:IIH I I I I I I II M I M H I I I : I I I 1 I I t : I I I I I 
NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 

250 260 270 280 290 300 



310 320 330 340 350 360 

SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

IMMIIIUIIIIIIIIIIIIIIIMIIIIIIIMMIIIIHIIIIIIIIIIIIMN 
SGS^GAIAQGAHDYI^RYHNQISVIEEGRSKELFGWAPQPDKYSITRTTLGHFUCNK 

310 320 330 340 350 360 



370 380 390 400 410 420 

LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

M 1 1 rl I I I I M I HI 1 1 M I I 1 M I I I I II 1 1 1 I I 1 I i 1 I 1 I I I I I I 1 1 1 M I I I 1 1 J I 
LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
LCSFVCPGKYEYGPLLRKVLETIEKEGX 
Ml I II I II II II IMIMI M I I M I I 
LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 



Computer analysis of these sequences gave the following results: 



Homology with 48kDa outer membrane protein 



of Actinobacillus pleuropneumoniae (acce ssion number U24492). 



50 



55 



60 



65 



ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 

Orf 2 2 1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALI^EEYAGMRPSMKVKEGDAVKKGQVLFED 60 
Orf22 1 ^™^+LPIAG P Q +++G + EVA+LGEEY GMR P SMKV+EG D VKKGQVLFED 
48kDa 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDVVKKGQVLFED 60 

orf22 61 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXtJDEIEFERYAPEALANLSGEEVRR 120 

KKNPGWFTAPASG + I+RGEKRVLQSWI VE +++I F RY LA+LS E+V++ 
48kDa 61 KKNPGWFTAPASGTVVTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

orf22 121 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

NLI+SGLWTA RTRPFSK+ PA+DA P +IFVNAMDTNP 
48kDa 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNP 158 

ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

gi|H8S395 (U24492) 48 kDa outer membrane protein (Actinobacillus pleuropneumoniae) 
Length =449 

Score - 530 bits (1351), Expect - e-150 
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Identities - 274/450 (60%), Positives = 323/450 (70%), Gaps - 4/450 (0%) 

Query: 1 MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
Sbjct: 1 MITIKKGLDLPIAGTPAQVIKNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

Query: 61 KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 120 

KK PGWFTAP SG + I +RGEKRVLQS WI VEG+++I F RY LA+LS + 
Sbjct: 61 KKNPGWrTAPASGTVVTINRGEKRVLQSVVIKVEGDEQITFTRYEAAQLASLSAEQVKQ -120 

Query: 121 NLIQSGLWTALRXRPFSKI PAVDAEPFAI rVNAMDTNPLAADPWVIKEAXXDFRRXXLV 180 

NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE DF+ V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEVVLKEYETDFKDGLTV 180 

Query: 181 LSRL — TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTKIHFVDPVGATKQV 240 

Query: 238 WTINYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADN 297 

W +NYQDVIAIG+LF ?G L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 298 RVISGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 357 

RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 358 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 417 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 

Query: 418 XXXXXS FVC PGKYEXG PLLRKVLETXEKEG 4 47 
++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gi 1 11853 95 (U24 4 92) 48 kDa outer membrane protein {Actinobacillus 
pleuropneumoniae) Length = 44 9 
Score = 555 bits (1414), Expect = e-157 

Identities = 284/450 (63%), Positives = 337/450 (74%), Gaps = 4/450 (0%) 

Query: 27 MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 86 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 
Sbjct: 1 M I T I KKG L DL P I AGT P AQV I HN GN T VN E V AMLGEE Y VGMR PS MKV R EG D\ r VKKGQ VL FE D 60 

Query: 87 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 

KKNPGWFTAPASG + I+RGEKRVLQSWI VEG+++I F RY LA LS+E+V++ 
Sbjct: 61 KKNPGVVFTAPASGTWTINRGEKRVLQSVVIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

Query: 147 NLIQ5GLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 206 

NLI+SGLWTA RTRPFSK+ PA+DA P + 1 FVNAMDTNPLAADP V++KE DFK GL V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 

Query: 207 LSRL — TERKIHVCKAAGADVp-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 263 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRL FNGQKPVYLCKDADSN I PLS PAI EG IT I KS FSGVHPAGLVGTH IHFVDPVGATKQV 240 

Query: 264 WTINYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADN 323 

W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 324 RVISGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 383 

RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: 301 RVI SGSVLSGATAAGPVDYLGRYALQVS VLAEGREKELFGW IM PGS DKFS I TRTVLGHFG 360 

Query: 384 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 443 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDI I PTLLLRDLAAGDTDSAQNLGCLELDEE 419 



Query: 444 XXXXXS FVCPGKYEYGPLLRKVLETIEKEG 473 

++VCPGK YGP+LR LE I EKEG 
Sbjct: 420 DLALCTYVCPGKNNYGPMLRAALEKIEKEG 449 
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Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumoniae, it was predicted that these proteins from ^meningitidis and ^gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
5 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E.colL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

10 Example 16 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 133>: 

1 . . GCGnCGnAAA TCATCCATCC CC.nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

15 151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

20 4 01 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

4 51 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

25 651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 

801 GrkCntramTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

30 901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 

1 ..AXXIIHPXXV VGPEANVfFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

35 ioi TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

40 Further sequence analysis revealed the complete DNA sequence <SEQ ID 1 35> to be: 

1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

45 201 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

50 451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 
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551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

5 751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

10 1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

15 1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

20 1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 

1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA SAV GAYFGLS 

51 VPDPRPVGAK GRADDG LIYI VSLLNADGFI KIL THTVKNF TGFAPLGTVL 

?5 101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 OOAAOIIHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

30 351 MST LGLYLVI I FFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGS VLFI 

401 GFILICAFIN LMI GSA5AQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGAPTFYP AP* 

Computer analysis of this amino acid sequence gave the following results: 
35 Homology with a predicted ORF from N. mening itidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) from strain A of N. 
meningitidis: 

10 20 30 

orfl2 pep AXXIIHPXXWGPEANWFFMVASTFVIALI 

40 i 1 1 1 1 i n n 1 1 1 1 1 u 1 1 1 1 1 1 1 1 1 

orfl2a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEAKWFFMVASTFVIALI 
180 190 200 210 220 230 

40 50 60 70 80 90 

45 orfl2 pep GYEVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

| I | | | | | | | || | | | | | | I | | I I I I I I I I I I I M I I I I 1 I I I I I I I I I I I i I I I I 

orfl2a GYFVTEKIVEPQLGPYCSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 

50 100 110 120 130 140 150 

or fl2 pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 
) 1 1 | I I M I I I 1 t < I 1 M t ! 1 1 I I I I I I 1 I f I I I I I I 1 t I I I I I I I II I 1 I I M Mill 
orfl2a PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMS 
300 310 320 330 340 350 

55 

160 170 180 190 200 210 

orfl2 pep T LX LXLXX I FFAAQFVAFFNWTN I GQY I AVKGAT FLKE VGLGG S VLFI G FI L I CAFI N LM 

II I | IIMIMIMIMIIIMIIIMIMMMIIMMMIMIIMI I 

or*12a TLGLYLVI I FFAAQ FVAFFNVTTN I GQY I AVKGAT FLKEVGLGGS VLFI GFILICAFINLM 

60 360 370 380 390 400 410 

220 230 240 250 260 270 

orf!2 pep IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 
i I i I I I It M II I I M M I I I II II IMMMIIMMMIIIMIIIIM I 
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orfl2a IGSASAQWAVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVIKY 

° riA ^ 4 20 430 440 450 460 470 

280 290 300 310 320 

orfl2 oeP KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

° |M III llll III II II HI II HUM! Ill II II I INN II III III 

KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
480 490 500 510 520 



orfl2a 



The complete length ORF12a nucleotide sequence <SEQ ID 137> is: 



1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGCCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCTGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCTCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGA CATTCCAATG AAATCACGCC 

601 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CAATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 

14«^1 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 138>: 

1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLL1A SAA GAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

101 VSL LGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 VVLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 f)OAAOTTHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VAL5ALLAWS IV PADGILRH 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MS TLGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGS VLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGAPTFYP AP* 

ORF12a and ORF12-1 show 99.0% identity in 522 aa overlap: 

10 20 30 40 50 60 

orfl2a oeo MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 
H F , | | 1 | 1 | 1 | | | I I t I I I I I I t t I I I I I I I I I I I I I I I 1 1 I t I r I I I I I 1 I 1 I I I I I t I I I 

orfl2-l MSCTTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl2a Deo GRADDGLIHWSLLDAIX5LIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

* P I 1 I 1 1 1 1 1 ^ r 1 1 1 1 : 1 1 1 x 1 1 1 I I I 1 1 1 1 1 1 1 1 1 1 I t 1 1 1 1 1 1 1 IIIIIMIIII 

orf!2-l GRADDGLIYIVSLI^ADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
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130 140 150 160 170 180 

orfl2a pep LLLTKSPRKLTTFMVVFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
| | I I I ! II I I I I I U I I I I | I I I M I I I ! I I I M I I I 1 I I I | | I I I I I M I N I I I I M I 
orfl2-l LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSMIFHSI^RHPI^GLAAAFAGVS 
130 140 150 160 170 180 

190 200 210 220 230 24_0 

orfl2a pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTrVlALIGYBVTEKI 
I I 1 I I I I I I I I I I I I I I I I I I t I I 1 I I I I t I I t 1 I I I I I I I I | | | 1 | I I 1 t 1 I i i I I I 1 1 
orfl2-l GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfl2a.pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

I I I I I I I I I I | I I I I I I | 1 M I I 1 I M 1 1 i I I i I I I I I ! I I I) I I 11 I i ) I I I I i 

orf 12-1 VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orfl2a peo PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
| | | I | M II I I I 1 i I I I M I I I I i i II I I M I I I I ! I I I I I I I I I I I I I I I I I I I M I \ I 
orfi2-l PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orfl2a pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFIKLMIGSASAQW 
| I | | | | | | I | t I I I I I I 1 I 1 I I I I U M I I I ! I I M I I I II I I I I I I I I I II II I II I 1 I 
orfl2-l IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQK 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 12a pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

I I I I I I I I II I I I I I I M I 1 1 I I I I M I I I M I I I 1 1 1 1 1 I I I M i 1 1 I I I I II I M I I I 
orfl2-l AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
430 440 450 460 470 480 

490 500 510 520 

or n2a . oep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I | | I | | | I I I I I I I I I I I I I I I I I I I II I I I I ! I I I II I I I I I 
orf 12-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

490 50C 510 520 

Homology with a predicted ORF from N. gonorrhoeae 

ORF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
gonorrhoeae: 



50 



55 



60 



65 



orf 12 .pep 
orfl2ng 
orf 12. pep 
orf 12ng 
orf 12. pep 
orf 12ng 
orf 12 .pep 
orfl2ng 
orf 12. pep 
orf 12ng 



AXXIIHPXXWGPEANWFFMVASTFVIALI 30 
I I I I I I I I I I I I I I I I : I I I I I I I I I 
AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGFEANWFFMAASTFVIALI 232 



GYFVTEKIVEPOLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

1 1 I I f I I I 1 1 I i t I I I I I I I 1 I I I I I I t t I I 1 I 1 t I f I 1 1 1 I 1 mm 

GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

| | | | | | | | | : | I ! I I I I I I II I I I 1 I I I I I I I I I I : I t I It I I : I I I 11 Mill 

PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 

Tl^LXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

|| | | M i I I I I I i I I I I I I M 1 M I I I I I : i I I : i I I I I I I I I I I I I I I I I I 1 I 
TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 



90 



292 



150 



352 



210 



412 



270 



IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 

lilllllllllMllllltllll I 1 r | 1 1 I I I I I 1 1 I I I I I t I I I I I 1 I I I 1 I 1 II I 
TGSASAQWAVTAPIFVPMLMLAGNAPQVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 472 
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320 



KKDAGVGTLIXMMLPYSAFFUAWIALFCIWVFVLGLPVGPGAPTFYPAP 

I I I I I I I I I I IIII|||||IIIIIIMIIMIIMHIIII:IIMI:I 
^DAGVGklSMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 



orf 12. pep 
orfl2ng 

The complete length ORF12ng nucleotide sequence <SEQ ID 139> is: 



1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tctgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

4 51 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

111 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 
ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

, n \H\ CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

30 i30i ccggSacgc GCCCGAAGTC ATTCAAGCCG CTTACCGCAT cggtgattcc 

US1 gttaccaata ttattacgcc gatgatgagt tatttcgggc tgattatggc 

1401 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 

35 i 50 l TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 

1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 

1 MSOTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK t^nnflT.THV VSLLDADGLI KILTHTVKNF TG FAPLGTVL 

40 101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LS^ASELGY 

4U \ll WLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWG PEANWF FMAASTEVIA LIGYFVT EKI VEPQLGPYQS 

251 dlsqeekdir hsneitpley kglih agwf valsallaws -Iv padg^rh 
301 PETGLVAGSP ft.ks twfif llfalpgivy g ritrslrge rewnamaes 

4 c 351 MSTLGLYLVI IFFAAOFVAF FNWTHIGQYI AVKGAVFLKK FRLGGSVLFI 

^ 401 G FILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IOAAYRIGDS 

451 VTM TTTPMMS Y FGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVLGLP VGPGTPTFYP VP* 

ORF12ng shows 97.1% identity in 522 aa overlap with ORF12-1 : 

e n 10 20 30 40 50 60 

10 20 30 40 3u ou 

55 70 80 90 100 110 120 

«r*l9-l MO GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPliGTVLVSLLGVGIAEKSGLlSALMR 
orfl2-l.pep GRADDGLIY j j . | | ( : | | | | | | | | | | | | | | 1 1 I I I I I I I I I I I I I I I ' 1 1 I I I I I I I 
orfl2ng GRADDGLIHVVSLLDADGLIKILTHTVKNFTGFAPl^WLVSLI^VGIAEKSGLI SALMR 

70 80 90 liU 



60 



130 140 150 160 110 180 

orf 12-1. pep LLLTKS PRKLTTFMWFTGILSHTASELGYWLI P^SAI I ^^SIX3RHPI*A<31AAA.E"AGVS 



65 orfl2ng 



111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 » 1 1 Ml 1 1 M I >• s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
[lATKSPR^F^FTGILSNTASELGYWLIPLSAViraSLGWlPIAGlAAAFA^ 
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orf 12-1. pep 
orf 12ng 

orf 12-1. pep 
orf 12ng 

orf 12-1. pep 
orf 12ng 

orf 12-1. pep 
orf 12ng 

orf 12-1. pep 
orf 12ng 

orfl2-i.pep 
orf 12nc 



190 200 210 220 

GGYSANLFLCTIDPLIAGITQQAAQIIH^ 

GGVS^LFLGTI DPLLAGITC^AAQI IHPDYWGPEANWFFWtfVSTFVIALIGYFVTEKI 
190 200 210 220 230 240 



300 



250 260 270 280 290 

VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

, j | , | | ) | | | J | | | | | | | | | | | | | I I II I I II I I I I I I I I I I I I i I I 1 M I I I I I I i I i I 
VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

270 280 290 300 



250 



260 



310 320 330 340 350 360 

PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

I | 1 I 1 j:l ill M I II! j 111 II M milll:lllllll:IMIMIIIIIMllllll 
PETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLV1 

310 320 330 340 350 360 

370 380 390 400 410 420 

I^FAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFXLICAFINLMIGSASAQW 

i M | 1 I 1 I M 1 1 I I M 1 I I M I I I I : I I I I M I t I I I I I M i 1 I I M I M I I i I I t I I t 1 
TF^AAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
370 380 390 400 410 420 

430 440 450 460 470 480 

AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

|| Ml | tit I III M HI II lllllliniMIIIIIIIMIIIIMIMIlll 

AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 

490 500 510 520 

LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
IIIIIMIII II MIHIIMIIII imill]|:IMII:|l 
LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGT PTFYPVPX 

490 500 510 520 
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In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

SDIP4 6133IYDAH ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gill787597 (AE000231) hypothetical protein in ogt 5'region [Escherichia colu 
Length =510 

^ntiUef = 5S/sS 3 Si%rSi"tl£" 281/507 (55%), Gaps - 15/507 <2%> 



Ouerv 8 RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 

y ' +SG+ VE 4-GN + PHP + FG+S +P D 
SbjCt: 13 QSGKLYGWVERIGNKVPHPFLLFIYLIIVLMVTTAILSAFGVSAKNP TDGTP 64 

Ouerv 68 IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 

Query. in^ ^ ^ ^ + +KNF+GFAP +AE+ GL+ ALM + + 

SbjCt: 65 VWKN LL S VEG LHW FL PN V I KN FSG FAP LGA I LAL VLGAGLAE R VG LLP ALMVKMAS H VN 124 

Ouerv 128 RKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVSGGYSANL 187 
Query, xt* ++MV+f s+ +s+ v++ p+ A+IF ++GRHP+AGL AA AGV G++ANL 

SbjCt: 125 ARYASYMVLFIAFFSHISSDAALVIMPPMGALIFLAVGRHPVAGLLAAIAGVGCGFTANL 184 

Ouerv 188 FLGTIDPLIAGITQQAAQIIHPDYVVGPEANWFFMAASTFVIALIGYFVTEKIVEPQLGP 247 
Query. + J D LL+GI+ +AA +P V NW+FMA+S V + ++G +T+KI+EP+LG 

Sbjct: 185 LIVTTDVLLSGISTEAAAAFNPQMHVSVIDNWYFMASSWVLTIVGGLITDKIIEPRLGQ 244 

Ouerv 248 YQSDLSQEEKDIRHSNEITPLEYKGLIWAG\A7FVALSALLAWSIVPADGILRHPETGLVA 307 
Query. xu^ w + + + g gl + ^ +A ++P +GILR P V 

Sbjct: 245 WQGNSDEKLQTLTESQRF GLRIAGWSLLFIAAIA1WVIPQNGILRDPINHTVM 298 

Ouerv 308 GSPFLKSIWFIFLLFALPGIVYGRITRSIJIGEREVVNAMAESMSTI^LYIJOCXXXXXXX 367 

SPF+K IV I L F + + YG TR++R + ++ + M E M + ♦+ 
Sbjct: 299 PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 358 

Ouerv 368 XXXXNVTTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQWAVTAPIF 427 
NW+N+G++IAV L+ GL 3 F+G L+ +F+ + I S SA W++ APIF 
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Sbjct: 359 VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 418 

Query: 428 VPMU4LAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 

VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 
Sbjct: 419 V PM FMLLG FH PAFAQI LFRI ADS S VLPLAPVS P FVPL FLG FLQRYKPDAKLGT Y YS LVLP 478 

Query: 4 88 YSAFFLIAWIALFCIWVFVLGLPVGPG 514 

Y FL+ W+ + W +++GLP+GPG 
Sbjct: 479 YPLIFLVVWLLMLLAW-YLVGLPIGPG 504 

Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 17 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 141>: 

1 . . ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

20 151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

201 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

25 401 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

4 51 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT. . 

This corresponds to the amino acid sequence <SEQ ID 142; ORF14>: 

1 ..TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

30 51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N.meningitidis (strain A) 
35 ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A of N. 
meningitidis: 

10 20 30 

orfl4 pep TAGAAGXXVFVFVT DSQVE VFGN I QT AVET 

I : I I I I I I I I I I I : I : : I I I I : I 1 1 I I 
40 or f 1 4 a GRQLG FLRVGGALFV IT AQARVNN ALC DCLTTGAAG FAVFV FVT DGQMQVFGNVQP AVET 

150 160 170 180 190 200 

40 50 60 70 80 90 

orfl4.pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAI FPAASRHMPVFCSSDGSRS 
45 I I I I I I I I I I I I I I I I I I 1 I I I I 1 I 1 I 1 I t I I I I I I t I M I 1 I I I I M I I t II I I I t I I 

or f 1 4 a GFFHGI SVSSVFGAAAQYSAMASRSAS I PVFSATEMRTAAI FPAASRHMPVFCSSDGSRS 

210 220 230 240 250 260 

100 110 120 130 140 150 

50 orf 14 .pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

| | | | || | | | | I I I I I II II II I II I I 1 I I I I I I II I M I I I M M I I I I I I I II I I I I I I 
orf 14a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310 320 
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160 

orf 14 . pep RXLTNPTVSVRIMLHSG 
| | | I 1 | 1 1 I 1 1 I i I I 1 

n-fl4a RS T TNPT VSVRIMI-HSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
5 ° 33C 340 350 360 370 380 

The complete length ORF14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

10 151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

15 401 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

4 51 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

20 651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

851 C T TGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

75 901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

100 t CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

30 This encodes a protein having amino acid sequence <SEQ ID 144>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LL^DQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

35 201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV+ 

It should be noted that this sequence includes a stop codon at position 118. 

40 Homology with a predicted O RF from ^gonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from K 

gonorrhoeae: 



TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 30 
I I IN 1 1 : 1 I : I : I : : I I I I : I I'll 
45 orfl4ng grqfgffrvggasfvitaqagiddalcdcltadaagfavfafvadgqmqvfgnvopavet 

or f 1 4 Deo GFFHGI S VSSVFGAAAQDS AMASRSAS I PVFS ATEMRTAAI FPAASRHMPVFCS SDGSRS 

" ,p F 1 1 tin i ii ii i ii in mi inn i ii i inn nun mil n mi mi m 

orfl4ng GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 268 



orf 14 .pep 



208 
90 



50 



orfl4 oeo VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

milium miiiiMiimim i m m 1 1 m i m : 1 1 m 1 1 1 m 1 1 1 m 

orfl4ng VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 328 



55 orfi4.pep RXLTNPTVSVRIMLHSG 167 

orfl4ng RSLTNOTVSTOIMUIAGI^SRRAWSRVAKSW 382 

The complete length OKF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 
having amino acid sequence <SEQ ID 146>: 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS ftwr^ann VLF AFFLVGGFDF 
51 LRVIGCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 
101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 
151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 
5 201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 
301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 
351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
10 proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 18 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 147>: 

1 . . GGCCATTACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 

15 51 GCCGTATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 

101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 

151 TCGTTCGGCG CGCTGATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 

201 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA 

251 AAA . NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 

20 301 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 

351 CGCCGANAAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 

401 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 

4 51 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 

501 GAATCAGGAA AAAGCCAACT GGATCGCACT CTTAAAA . CC GCGC. . 

25 This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 

1 . .GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GWPQTWVA FYVGAALLVI TSAFTIFKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A.. 

30 Further work revealed the complete nucleotide sequence <SEQ ID 149>: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

35 201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

40 451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

45 701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

50 951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCTTGTT TAACGGCTCT 

55 1201 ATCTGTATGC CTCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 
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This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 
51 ADPHNLGWFF TT^PPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 
101 AVTVMTT>iPyj SKSFGFGY AS LAALS FGALM IALLPY SS NM AMQPFKMMVG 
151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP EVFAYIGLA N TAEKGWPQT 
201 VWAFYVG AA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 
251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 
301 EAGNWYG VLA AVOSVAAVIC SFVL AKVPNK YHKAGYFGCL ALGALGFFSV 
351 FF T GNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 
4 0 1 ICMPQ IVASL LSFVLFPML G GLQATMF LVG GWLLLGAFS VFLI KETHGG 
4 51 V* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninz itidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF16a) from strain A of N. 



15 meningitidis: 

10 20 30 

Arfl , .„ GHYSDRTWKPRLXGRR LPYLLYGTLIAVIV 
orflb.pep Tl I I I I I I I I I I II III IHI I II I i in 

orf 16a T-nTTr,ADPHSLGW FFILP?IAGMLVQPIVG HYSDRTWKPRLGGRRLPYLLYGTLIAVIV 
20 " 50 60 70 80 90 100 

40 50 60 70 80 90 

orf 16 oeo MTT.MPNSGSFGFGY ASLAAliSFGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKXYAYGI 
orf lb. pep -j-j-j" t ! t i ft | i | | | i I It I I : H ) 1 I I I il II I! 111111111111111111111 I I 11 I 
?S or f 1 6a mtt.mpm5;g?;FGFGY AS LAALS FGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKGYAYGI 
" HO 120 130 140 150 160 

100 110 120 130 140 150 

orf 16 DeP n-qFLANTGA WAAILPFVFAYIGLA NTAXKGWPQTVWAFYVGAALLVITSAF7IFKVK 
30 ° " P " lllllt Mil Mill M I I Ml I I I Ml U I M I I M M I I t U M I I M H I I M M I 

. 16 O^FIANTG AWAAI LPFVFAYIGLA NTAEKGVVPQTVVVAFYVGAALLVITSAFT I r KVrv 

170 180 190 200 210 220 

160 170 18C 

35 o-'l 6 . pep EYXPETYARYHGIDVAANQEKANWIALLKXA 

n mm mini I ii mm mi mn 

orf 16a EYN PET YARYHG I DVAANQEKANW IELLKTAPKAFWTVTLVQFFCW FAFQYMWTYS AG AI 

230 240 250 260 270 280 

40 or f 1 6a AENWHTTDASSVGYQEAGNWYGVIAA\^ 

290 300 310 320 330 340 

The complete length ORF16a nucleotide sequence <SEQ ID 151> is: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

45 xoi CCTTTACCCT GCAAAGCTCG CAGATGAGCC GCATCTTCCA GACGCTCGGT 

151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

50 351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

4C1 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

4 51 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

55 60 l GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

60 851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGGTTATTT 
TTCTTCATCG 
CATCGCTTGG 
CCTTGTCGGG 
ATCTGTATGC 
TATGCTGGGC 
TGCTGCTGGG 
GTTTGA 



CGGCTGTTTG 
GCAACCAATA 
GCGGGCATTA 
CAAGCATATG 
CGCAAATCGT 
GGCTTGCAGG 
CGCGTTTTCC 
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GCTTTGGGCG 
CGCGCTGGTG 
TCACTTATCC 
GGCACTTACT 
CGCTTCGCTG 
CCACTATGTT 
GTGTTCCTGA 



CGCTCGGCTT 
TTGTCTTATA 
GCTGACGATT 
TGGGCCTGTT 
TTGAGTTTCG 
CTTGGTAGGG 
TTAAAGAAAC 



TTTCTCCGTT 
CCTTAATCGG 
GTGACCAACG 
TAACGGCTCT 
TGCTTTTCCC 
GGCGTCGTCC 
ACACGGCGGG 



This encodes a protein having amino acid sequence <SEQ ID 152>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 
AHPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 
&VTVMTLMPN SGSFGFGYA S LAALSFGALM IALLDV SSNM AMQPFKMMVG 
DMVNEEQKGY AYGIQSFLAK TGAWAAILP FVFAYIGLAN TAEKGWPQT 
WVA FYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA ANQEKANWIE 
LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 
EAGNWYGVLA AVQSVAAVIC SFVLAKVPNK YHKAGY FGCL A LGALGFFSV 



ffignoy maTlsytligiaw agii typlti vtnalsgkhm gtylglfngs 

TCMPOIVASL LSFVLFPMLG GLQATMFLVG GWLLLGAFS VFLIKETHGG 



20 ORF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orfl6a.pep 
orfl6-l 

orf 16a. pep 
orfl6-l 

orfl6a.pep 
orfl6-l 

orfl6a.pep 
orfl6-l 

orf 16a . pep 
orfl6-l 

orf 16a. pep 
orfl6-l 

orfl6a.pep 
orfl6-l 

orf 16a. pep 
orfl6-l 



10 20 30 40 50 60 

MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 

M | M M I I t I t I IN II III I I U II I I I I I I I I 1 II M I I I M I I 1 I I I I I I : I H I I 
MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFF 

10 20 30 40 50 60 

70 80 90 100 110 120 

IL^PLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 
1 M I 1 I I I I I M 1 1 1 1 1 1 1 f I I I I 1 M I I I 1 1 1 I I I I > I M I I I I 1 I I I I M 1 I M I 1 I I 
I LPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPN SGSFGFGYAS 

70 80 90 100 110 120 

130 140 150 160 170 180 

IJ^SFGAI>IIALLDVSSNMAMQPFK>IMVGDMWEEQKGYAYGIQSFIANTGAWAAILP 

1 1 | I | M | t M I I 1 1 I I I I I I I I I M t I I I I II I I I I I I I M It I II I i I I I I I 1 I I I H 
LAAL S FGALM I ALLDVS SNMAMQP FKMMVGDMVNEEQKG Y AYG I QS FLANTGAWAAI LP 
130 140 150 160 170 180 

190 200 210 220 230 240 

FVFAY I GLANTAEKGWPQTWVAETVGAALLV ITS AFT I FKVKEYNPET YARYHGIDVA 
I I I i | | | | I I I I I I I 111 I II II I I I I 1 I M I I t I I I I I I I I M I I : M i 1 I I i I 11 I I I 
FVFAY IGIJ^TAEKGVVPQTVVVAFYVGAALLVITSAFTI FKVKE Y DPET YARYHG IDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 
MM MMIimiimiimiMlillllHlMIHMIMMII lllllllllll 
ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

1 1 | | I 1 1 1 1 I I 1 I ! I I t I I I I I 1 1 I I I I I I I I I I I I 1 I I ! 1 I I 1 I 1 t I I I I I I 1 I I I I t 1 
EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

310 320 330 340 350 360 

370 380 390 400 410 420 

LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

| 1 | I 1 I I I 1 1 I 1 I I t I 1 I I I I I I I 1 I t 1 I t I I I I I I 1 I I I I I I I I t I I t I I I I I i I 1 I I I 
LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 
370 380 390 400 410 420 

430 440 450 

GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 
1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 M M 1 1 It I I I 1 1 1 1 
GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

430 440 450 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from N. 



gonorrhoeae: 

orf 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 
orfl6.pep 
orf 16ng 



GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 

immiimmii 

HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 

MIIilPNSGSFGFGYASIAALSFGAl^IALLDVSSNMAMQPFKMMVGDMVNEEQKXYAYGI 
I, , I I I H, m , H I | I I I I MH I t I I I lit IIM I M I I I I I M I I II I M Mill 
MILMPNSGSFGFGYASLAALS FGALMIALLDVSSNMAMQPFKMMVGDMVNEEQK5YAYGI 

OSFIJ^TGAWAAILPFVFAYIGLANTAXKGVVPQTVVVAFYVGAALLVITSAFTIFKVK 

III III! (Ml III III M I I III Hi IIIIIIIIIMMIIIIII-.MIIMI III 
QSFLANTDAWAAILPFVFAYIGIANTAEKGWPQT\AA7AFYVGAALLIITSAFTiSK\^ 



30 



131 
90 
191 
150 



251 



181 



E YX PET YARYHG I DVAAN QEKAN W I ALLKXA 

eydUtyaryhgidvaan 311 



I The complete length ORF1 6ng nucleotide sequence <SEQ ID 1 53> is: 



3 



5 



^0 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 

951 
1001 



ATGATAGGGG 

TACTTTTCAA 

CAAACAGCAA 

GTTGAGCTTC 

CGCAGATGAG 

GGCTGGTTTT 

AGTGGCTACT 

CCTGCCGTAT 

TGATGCCGAA 

TTGTCGTTCG 

GGCGATGCAG 

AGAAAAGCTA 

GTTGTGGCAG 

CACTGCCGAG 

TGGGTGCGGC 

AAAGAATACG 

CGCGAATCAG 

AAGTGTTTTG 

CGGTATATGT 

CACTACCGAT 

GCGTTTTGGC 



ATCGCCGCGC 
ATCAAAAAAA 
AACAAGGTTT 
GGCTATCTCG 
CCGCATTTTT 
TCATCCTGCC 
ACTCAGACCG 
CTGCTTTACG 
CTCGGGCAGC 
GCGCGCTGAT 
CCGTTTAAGA 
CGCCTACGGG 
CGATTCTGCC 
AAAGGCGTTG 
GTTACTGATT 
ACCCGGAAAC 
GAAAAAGCCA 
GACGGTTACT 
GGACTTACTC 
GCGTCTTCCG 
GGCGGTGTAG 



CGGCAACCAT 

AGGATTTACT 

GCCCGCGCCG 

GCGTTCAGAC 

CAAACGCTAG 

GCCGCTGGCG 

CACTTGGAAG 

GCACGCTGAT 

TTCGGTTTCG 

GATTGCGCTG 

TGATGGTCGG 

ATTCAAAGTT 

GTTTGTGTTC 

TGCCACAAAC 

ATTACCAGTG 

CTACGCCCGT 

ACTGGTTCGA 

CCGGTACAGT 

GGCAGGCGCG 

TAGGCCATCA 



TTCGGATTTT 

TTATGTCGGA 

GCAAAAAGCA 

GGCCTTTACC 

GCGCAGACCC 

GGGATGCTGG 

CCGCGCTTGG 

TGCGGTCATC 

GCTATGCGTC 

TTGGACGTGT 

CGATATGGTC 

TCTTAGCGAA 

GCGTATATCG 

CGTGGTCGTA 

CGTTCACAAT 

TACCACGGCA 

ACTCTTAAAA 

TTTTCTGCTG 

ATTGCAGAAA 

GGAGGCGGGC 



CCAAAGCAAA 

ATATACGCCT 

CGATTTGGAT 

CTGCAAAGCT 

GCACAATTTG 

TTCAGCCGAT 

GCGGCCGCCG 

GTGATGATTT 

GCTGGCGGCC 

CGTCGAATAT 

AACGAGGAGC 

TACGGACGCG 

GTTTGGCGAA 

GCATTCTATG 

CTCCAAAGTC 

TCGATGTCGC 

ACCGCGCCTA 

GTTCGCCTTC 

ACGTCTGGCA 

AACCGGTACG 



This encodes a protein having amino acid sequence <SEQ ID 1 54>: 



45 



1 MIGDRRAGNH FGFSKANTFQ 

51 VELRLSRRSD GLYPAKLADE 

101 SGYYSDRTWK PRLGGR RLPY 

151 LSFGALMIAL LDV SSNMAMQ 

201 WAAILPFVF AYIGLAN TAE 

251 KEYDPETYAR YHGIDVAANQ 

301 RYMWTYSAGA IAENVWHTTD 



IKKKDLLYVG IYASNSKTRF 
PHFSNARRRP AQFGLVFHPA 
LLYGTLIAVI VMILMPNSGS 



PFKMMVGDMV NEEQKSYAYG 
KGWPOT VW AFYVGAALLI 
EKANWFELLK TAPKVFWTVT 
ASSVGHQEAG NRYGVLAAV* 



ARAGKKHDLD 
AAGGDAGSAD 
FGFGYA SLAA 
IQSFLANTDA 
ITSAFTISKV 



PVQFFCWFAF 



50 ORF16ng and ORF16-1 show 89.3% identity in 261 aa overlap: 



55 



60 



orf 16-1. pep 
orf 16ng 



orfl6-l.pep 
orf I6ng 



30 40 50 60 70 80 

MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

1 : : I I I II : 1:11111 
DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 

50 60 70 80 90 100 

90 100 110 120 130 140 

WKPRLGGRRLPYLLYGTLIAVIVMIli4PNSGSFGFGYASLAALSFGALMIALLDVSSNMA 

I 1 | | | | I 1 I I I 1 1 I t t 1 1 1 I f I 1 I 1 I 1 I I 1 1 I 1 I 1 1 I I 1 I I I I 1 K t I I > I I I t 1 1 1 i I < > 
yn<PRLGGRRLPYLLYGTLIAVI\MIIJ4PNSG5FGF 
H0 120 130 140 150 160 
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10 



15 



or f 16-1. pep 
orf I6ng 



orf 16-1 .pep 
orf 16ng 



orf 16-1 .pep 
orf 16ng 



150 160 170 180 190 200 

MQPFKMMVGDMVNEEQKGYAYG I QSFLANTGAVVAAI LPFVFAY IGLANTAEKGVVPQTV 

I I 1 1 I t I 1 I ! 1 I 1 | I | | r k 1 1 I 1 M I M I t MMIMI IMMMIIMI lllllllll 
MQPFKMMVGDMVNEEQKSYAYGIQSF1ANTDAVVAAXLPFVFAYIGLANTAEKGVVPQTV 

170 180 190 200 210 220 

210 220 230 240 250 260 

WAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVAANQEKANWIELLKTAPKAFWT 

! 1 1 t 1 I t I I t I : I t I 1 I 1 t 1 I I 1 1 I t I I 1 I I I I | | | I 1 I I 1 I J I I 1 = 1 I I I I I I J = I I I 
WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 

230 240 250 260 270 280 

270 280 290 300 310 320 

VTLVQF FCW FAFQYMWTY SAGAIAENVWHTTDAS S VGYQEAGNW YGVLAAVQS VAAVI C S 

II lltlMIM:||ltlitllllllll!IMilin:tllil I I I I I I I 
VT PVQFFCW FAFRYMWT YS AG AI AENVWHTT DAS S VGHQEAGNR YGVLAAVX 

290 300 310 320 330 340 



20 Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and M gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 19 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 155>: 

25 1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

30 251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT . . . 

This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 

1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 

35 51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQIVXDT 

101 PSYXCHQALP VKLGSXGSQN . . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

40 101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

45 351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

401 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

50 601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

701 ATGCCGCCCG CAAATGA 

This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 

1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

55 51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 



BNSOCWa <WO 99e457aA2_l_> 



WO 99/24578 



-141- 



PCT/IB98/01665 



201 KLFANILYTP P FLTLDAAGA VLALPAAALG AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 
Homftlnpy with a predicted ORF from N- meningitidis terrain 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A of N. 
5 meningitidis: 

10 20 30 40 50 60 

orf28 pep MLFTlKTTAAVUUiTIJdl^G CTI^LWGMNNPVSETITRKHVXKDQIRXFGVVAEDNAQLEK 

t I I M I I I I 1 t I rTTTTTl I : I s I I I 1 : I III : I t I I Mill 1IIIMMMIM 
or f 2 8a MLFRKTTAAVLAATIJ<LNG CTVMMWGMNSPFSETTARKHVDKDOIRAFGVVA£PNAQLEK 
10 " 10 20 30 40 50 60 

70 80 90 100 110 120 

or*28 pep GSLVMMGGKYWFVVNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 
II I I ! I I I I I I M I I I I I I I Mil Mill 11:1 M : : I It II It I : III 
1 5 or f 28a GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 

70 . 80 90 100 110 

orf28a FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 

20 The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

25 201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 

301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

351 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT ACCGACAGAC 

4 01 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 

30 4 51 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CAAATCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 

651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 

35 701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ED 160>: 

1 MLFRKTTAAV LAATLMLNGC TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

40 151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

201 LFENIAYTPT TL ILDAVGAV LALPVAALI A ATNSSDK* 

ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

0^f28a pep MLFRKTTAAVLAATLMLNGCTVMMWGMN S PFSETTARKHVDKDQIRAFGWAEDNAQLEK 

45 " " II M I M II I I M II I I II M : I : I I M : I Ml : I II M I I II I I I II M I I I I M M 

orf28-l MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 119 

50 orf 28a .peo GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 

I II I I I ! I I I M II M II M I M I M M II M I I M I : ! : I : I II I I M I I I : M I 
orf 28-1 GSLVMMGGKYWF^A^PEDSAKLTGILKAGLDKPFOIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

55 120 130 140 150 160 170 179 

or f 28a . pep FSTEGU:iAYDTDRPADIAKIJCQl£FEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
I ) | I | I I I I 1 1 I I : II M M I I I I I I M : II M M I M M II I M I I I I II I M M M I 
or f 2 8 - 1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRT I YTRCVSAKGKYYATPQKLNADYKF 

130 140 150 160 170 180 
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180 190 200 210 220 230 

orf28a.pep EQSVPADI YYT VTKKHTDKSKLFEN 1AYTPTTLI k D ^V<^Vl^^A^*^TOSS DKX 

orf28-l EQSVPADI Y^^EEHTDKSKLFAN I LYTPPFLILDAAGAVIJ^PAAALGAVVDAARKX 

0rtZH A 190 200 210 220 230 

Homology wi*> * predicted QRF from ^gonorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 



I gonorrhoeae: 



MLFRKTTAAVIJUiTLMLNGCTLMLWGMNNPVSETITRKH 



60 



orf28.pep fu*FRKTTAAVIJUiTIjMI^VA»i ww»nonmif»«* iiiwn»«»«if-— ~ - ,TT, ill i i 

0 pp mi mi mi 1 1: ii iii: it imiiii:iiii in miii in Milium 

orf28ng mI^TAA^ 60 

> orf28 pep GSL^GGKYWFVVNPEDSAXXTGII^GLDKPFQITO^ 120 

> orno.pep i I I I I i I i i I I i . i | | | | 1 1 ||:| 111 II II I II Mill HUM I: ' MM 
orf28ng GSLVM^GKYWFATOPEDSAKLTGLLKAGL^ 120 

The complete length ORF28ng nucleotide sequence <SEQ ID 161> is 

0 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

51 GAACGGCTGT ACGATGATGT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

S 301 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

351 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA 

401 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 

OA 55i TGCCCGCCGA TATTTATTAT ACGGTTACTG AAAAACATAC CGACAAATCC 

W 601 AAGCTGTTTG GAAATATCTT ATATACGCCC CCCTTGTTGA TATTGGATGC 

651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 162>: 

! MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADI YY TVTEKHTDKS 

201 KLFGNILYTP PL LILDAAAA VLVLPMALI A AANSSDK* 

40 ORF28ng and ORF28-1 share 90.0% identity in 23 1 aa overlap: 

10 20 30 40 50 60 

orf 28-1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETI 

PP I Ml lilt II Mill: III I I: M I M M I I : M M II I I M I M I M M II M I M II 
orf28nq MLFTOTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFXa 
45 9 10 20 30 40 50 60 

70 80 90 100 110 120 

orf28-l pep GSLVMMGGKYWFWPEDSAKLTGII^^ 

° r P P , ,. , | | | | 1 I M : M II M I II M : I I II M M II I II I M M I II I II I M : I : I M I I 

50 orf28na GSLXH^GGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

JV y " " 80 90 100 110 120 



70 



orf28 l.pep UIIMM I : I I I 1 1 II M II I I I I I I I M M I II M I II II I 

orf28no FSTGGLCLR YDTGRPDDI AKLKQLE FKAVKLDNRT I YTRCVS AKGKYYAT PQKLNAD YH F 
y 130 140 150 160 170 180 

190 200 210 220 230 239 

*ft orf28-l oeD EQSVPADI YYTVTEEHTDKSKLFAN I LYTPPFLILDAAGAVLALPAAALGAVVDAARKX 

OU orf28 l.pep I 1 1 1 1 1 1 1 1 1 t I ^ I 1 1 1 1 1 1 I r 1 | 1 1 1 1 1 : 1 1 t I 1 1 : 1 1 1 = 1 1 I ::l: 

orf28ng EQSVPADI YYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 



130 140 150 160 170 180 

FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
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190 200 210 220 230 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
5 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6 A shows the results of affinity purification of the GST-fusion protein, and Figure 6B shows the 
results of expression of the His-fiision in E.coli. Purified GST-fusion protein was used to immunise 
10 mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 

Example 20 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 163>: 

1 . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

15 5i TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

20 301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT. . 

This corresponds to the amino acid sequence <SEQ ID 164; ORF29>: 

1 .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 
25 101 TKTSIVPQAP FSDRWLEENA GAASG. . 

Further work revealed the complete nucleotide sequence <SEQ ED 165>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

30 * 151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

35 401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

40 651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 

45 901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

50 1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 
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1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

12S1 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

5 1401 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

14 51 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 

1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

10 101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTOIVPQ APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

15 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLTPNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

451 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from ^me ningitidis (strain A> 
20 ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of AT. 
meningitidis: 

10 20 30 

^f?Q VSPVLPITHERTGFEGVIGYETHFSGHGHE 
° rt ^ ,P P I 1 1 C I 1 I 1 1 1 1 I I I I 1 = I I I I I I I I I I I 1 I 

05 orf29a EPGGKYHLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHE 

" ori^a ^ ^ 7Q B0 9Q 10Q 

40 50 60 70 80 90 

orf29 Deo VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
30 <>""-P P | | | | | | : | | | | | ) | | | | | | | | | } I I I I I I I I I lllltll Mill:: I I I I I I I I I I I 

orf29a VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
UO 120 130 140 150 160 

100 110 120 

35 orf29 .pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 

Ml Ml II I I:: I I I: M MM I MM I II II I 
ft rf29a YVYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
110 180 190 200 210 220 

40 orf29a MDDIRGIVCJGAVNPFLMGFQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLA 

230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 

1 ATGAATTNGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

45 10 l GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

50 351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

55 60 1 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

60 851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



TGCCCTTGCC 
TAGAACTTAA 
ACACCTGCTG 
GAATAGACCG 
CACAACCGTC 
CATGCTTATA 
TATCAATTCA 
ATCCANCAAA 
NATAAAACAG 
TACAGCATTT 



GTAGCAGANG 
CCCGACCAAA 
TTCGCACCAT 
CCTAAATCTA 
TTTACAAGCG 
ACAAGCATGT 
CCAGCAGATT 
TATGAAAGAG 
GGACNATAGT 
AGACCAACAT 



CCGCAACTAC 
TGGGATTGGG 
GCATACTTTG 
TAACGTCCAA 
CAACTAATTG 
CATAAGACAA 
TTGCTCGGCA 
TTACCTCGCG 
TATCCGAGAT 
CAGGTAAAAA 



GGTTTGGGGC 
TTAAAAATAC 
GATGGGGAAA 
CAGCAAAGCA 
GAGAACAAAT 
CAAGAATTTA 
TATTGAAAAT 
GTAGAACTGC 
AAAAATTCTG 
ATATTATGAT 



GGTAAAAAAG 
NGGCTATAAN 
TGGCCGGTGG 
GATGCTTCCA 
TAKNNNNGGG 
CGGATTTAAA 
ATTGTTAGCC 
GTATTGGGAT 
ACGATGGAGG 
GATTTATAG 



This encodes a protein having amino acid sequence <SEQ ID 168>: 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNXPIQKFMM LFAAAI SXLQ IPISHANGLD 



FGNARGSVKN 
KEVHSPFDNH 
DYPPPGGARD 
RADEAGKLIW 
DSAVSPVTDT 
ARQWADAHPN 
TPAVRTMHTL 
HAYNKHVIRQ 
XKTGTIVIRD 



RVYAVQTFDA 
DSKSTSDFSG 
IYXXYVKGTS 
ESDPNKNWWA 
AAQQTLQGXN 
ITATAQTALA 
DGEMAGGNRP 
QEFTDLNINS 
KNSDDGGTAF 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKSNIVPR 
NRMDDIRGIV 
HLGXLSPEAQ 
VAXAATTVWG 
PKSITSNSKA 
PADFARHIEN 
RPTSGKKYYD 



ARLRDDMQAK 
HERTGFEGII 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLMG 
LAAATALQDS 
GKKVELNPTK 
DASTQPSLQA 
IVSHPXNMKE 
DL* 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGS 
NAGAASGFFS 
FQGVGIGAIT 
AFAVKDGINS 
WDWVKNTGYX 
QLIGEQIXXG 
LPRGRTAYWD 



ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 
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60 



10 20 30 40 50 60 

orf 29a . pep MNXPIQKFMMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
|| | | | | I I I I I i II I I I I I I t i I I I I I I ! I i I 1 I I I I I t I I I I 1 I I I M I ! M I I I t : 
orf 29-1 MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf29a.peo RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
} M I I I M I ! I I h 1 : 1 1 1 I t I I I I I M : (I I I I i I t I t I I I I I I I I I : t 1 I i i I t I t 1 I 
orf 29-1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 29a . pep GVDGGFTVYQLHRTG3EIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 

MIMilllllllMIIMIIMIllMMlil ill! I I M I I M t I : I I I I : 

orf 29-1 GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 29a . pep APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 
I | M | M | | | I I I I I I I II I I II I I I I I I I I II I I I I I I I I I I I I : I I I I I I I I I I I I I I 
orf 29-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 29a. pep FQGVG I GAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDS AFAVKDGINS 

I I | | | | | | II I I I I I I I I I I I I I I II I I I II I I I I ! I I I I I : I I I I I I! II I I I I I 
orf 29-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 29a . pep ARQWADAH PN ITATAQTALAVAXAATT VWGGKKVE LN PTKW DWVKNTG YXT PAVRTMHTL 

I : | | | | | | I I I I I II I I I I : : I I I III I I I I I I I I I I I I I I I I I I I I I : I 1:11 
or f 2 9- 1 AKQWADAH PN I T AT AQT ALS AAEAAGT VWRGKKVE LN PTKWDWVKNTGYKKPAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 2 9a. pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

11111111:111: Ml: I 
orf 29-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 
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Homology with a pr edicted ORF from hi F onorrhom 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from N. 



gonorrhoeae: 



VSPVLPITHERTGFEGVIGYETHFSGHGHE 30 
orf29.pep 1 : 1 : 1 1 1 1 1 | | | 1 1 I I I I I M I I I 1 1 1 I I J _ 

orf29ng EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 

0rf29 pep VHSPFDHHDSKSTSDFSGGVDGGFTWQUiRTWSEIHPEDEYDG^^ 90 
10 orf29ng VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 162 

orf29 pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 125 

i i . . II | | | M 1 s M | II II II M : I I I I I I I I 
orf29ng syAiKGTTSTKTKIN^ 222 

15 The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 170>: 

1 MNLPIOK FMM LFAAAISLLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGG 

OA 151 GYPPPGGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FOG LG VGA IT 

251 DSAVSPVTYA AARKTLQGIH NLGNLSPEAQ LAAATALQDS AFAVKDSINS 

301 ARQWADAHPN ITATAQTALA VTEAATTVWG GKKVELNPAK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNKP LESKNTVTTN NFFENTGYTE KVLRQASNGD 

25 401 YHGFPQSVDA FSENGTVIQI VGGDNIVRHK LYIPGSYKGK DGNFEYIREA 

451 DGKINHRLFV PNQQLPEK* 

In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 

1 atgAATTTGC CTATTCAAAA ATTCATGATG ctgttggcAg cggcaatatc 
51 gatgctGCat ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

-10 101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGCAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTGCG CCGTCCAAAC 
201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 
251 CAGGATTTGA AGGTGTTATC GGCTATGAAA CCCATTTTTC AGGACACGGA 
301 CACGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

35 351 TTTCAGCGGC GGCGTAGACG GCGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATACATCCC GCAGACGGAT ATGACGGGCC TCAAGGCGGC 
451 GGTTATCCGG AACCACAAGG GGCAAGGGAT ATATACAGCT ACCAT AT CAA 
501 AGGAACTTCA ACCAAAACAA AGATAAACAC TGTTCCGCAA GCCCCTTTTT 
551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCTTCCGG TTTTCTCAGC 

40 601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAACGACC CCGATAAAAA 

651 TTGGCGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 
701 TTAATCCTTT TTTAACGGGT TTTCAAGGGG TAGGGATTGG GGCAATTACA 
751 GACAGTGCGG TAAGCCCGGT CACAGATACA GCCGCTCAGC AGACTCTACA 
801 AGGTATTAAT GATTTAGGAA ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 

4< 851 CGAGCCTATT ACAGGACAGT GCCTTTGCGG TAAAAGACGG CATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACAGCAA CAGCCCAAAC 
951 TGCCCTTGCC GTAGCAGAGG CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 
1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC CGGCTATAAA 
1051 AAACCTGCTG CCCGCCATAT GCAGACTGTA GATGGGGAGA TGGCAGGGGG 

50 U0 1 GAATAGACCG CCTAAATCTA TAACGTCGGA AGGAAAAGCT AATGCTGCAA 

1151 CCTATCCTAA GTTGGTTAAT CAGCTAAATG AGCAAAACTT AAATAACATT 
1201 GCGGCTCAAG ATCCAAGATT GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 
1251 TTTTCCAATA GGAACTGCAA CTTATGAAGA GGCAGATAGA CTAGGTAAAA 
1301 TTTGGGTTGG TGAGGGTGCA AGACAAACTA GTGGAGGCGG ATGGTTAAGT 
55 135 i AGAGATGGCA CTCGACAATA TCGGCCACCA ACAGAAAAAA AAT CACAATT 

1401 TGCAACTACA GGTATTCAAG CAAATTTTGA AACTTATACT ATTGATTCAA 
1451 ATGAAAAAAG AAATAAAATT AAAAATGGAC ATTTAAATAT TAGGTAA 

This encodes a protein having amino acid sequence <SEQ ED 172; ORF29ng-l>: 

1 MNLPI OKFMM LLAAAISMLH IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 
60 51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 
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101 HEVHSPFDNH 

151 GYPEPQGARD 

201 RADEAGKLIW 

251 DSAVSPVTDT 

301 ARQWADAHPN 

351 KPAARHMQTV 

401 AAQDPRLSLA 

4 51 RDGTRQYRPP 



DSKSTSDFSG 
IYSYHIKGTS 
ENDPDKNWRA 
AAQQT LQGIN 
ITATAQTALA 
DGEMAGGNRP 
IHEGKKNFPI 
TEKKSQFAT? 
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GVDGGFTVYQ 
TKTKINTVPQ 
NRMDDIRGIV 
DLGNLSPEAQ 
VAEAAGTVWR 
PKSITSEGKA 
GTATYEEADR 
GIQANFETYT 



LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLTG 
LAAASLLQDS 
GKKVELNPTK 
NAATYPKLVN 
LGKIWVGEGA 
IDSNEKRNKI 



ADGYDGPQGG 
NAGAASGFLS 
FQGVGIGAIT 
AFAVKDGINS 
WDWVKNTGYK 
QLNEQNLNNI 
RQTSGGGWLS 
KNGHLNIR* 



ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



orf29ng-l .pep 
orf29-l 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 
orf29-l 



orf29ng-l.pep 
orf29-l 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 



orf29-l 



orf29ng-l.pep 
orf29-l 



10 20 30 40 50 60 

MNLPIQKFMMLLAAAISMLHIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
| | | | M I I I M : I 1 I I I : | : | ! i I M i I 1 I I I I I I I I I I I I I ! i M I I I 1 I I I I IE II I : 
MNLPIQKFMMLFAAAISLLQIPISHAKGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
|| I I I II II I I I : I: I I I I I I I I I I I I I I I 1 I I ! I I I I I I I I I I II I : 1 II I II II I I I 
R VY AVQT FD AT AV S P VL P I THERTG FEG V I G YETH FS GHG HE VH SPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 

I I I I I I I I I 1 I I I II I I I I I I I I I I I I i : II I I I I I I I 1 I :: I I I I i I t I I ill 
GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 
130 140 150 160 170 180 

190 200 210 220 230 240 

APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 
| | | | | | 1 I I i I I I M U I : I I I I I I I I t i I I : II : I I ! I I I 1 I I : I I I I I 1 I I I I I I I 
APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

FOGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 
I I I I i M I I i I I I I I I I I 1 M I I I II I i I I I I I : I I I I t I I I I I I M I I I I I I I I I I I I I 
FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

ARQWADAHPNITATAQTALAVAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPAARHMQTV 

I : | | | | | ! II I I I I I I I I l: : I I I 11 1 I 1 I I I I I I I I I I I 1 I I I I I I I II I I I I I I t I I : 
AKQWADAH PN ITAT AQTALS AAEAAGTVWRGKKVE LN PTKW DWVKNTG YKK PAARHMQTL 
310 320 330 340 350 360 

370 380 390 400 410 419 

DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQLNEQNLNNIAAQDPRLSLAIHEGKKNFP 
11111111:1 II: :| :: :: |: :: : ::::: 

DGEMAGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVKT 
37C 380 390 400 410 420 



420 430 440 450 460 470 479 

orf 29ng-l . pep I GT AT YEEADRLGKI WVGEGARQTSGGGWLSRDGTRQYR P PTEKKSQFATTG IQAN FETY 



orf29-l 



RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 
430 440 450 460 470 480 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and ^.gonorrhoeae^ and their epitopes, 
60 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 21 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 173>: 
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1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

This corresponds to the amino acid sequence <SEQ ID 174; 0RF3O: 

5 1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 1 75>: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

10 151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

15 401 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATTATCATCG TCGAGTTACG 

4 51 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1>: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

20 51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N.meninzitidis (strain A) 
25 ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A of N. 
meningitidis: 

10 20 30 40 

orf 30 .pep MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQMFHTRADAPMQ 
I I II I I I I I I I I I I 1 I I 11 I I I I I I I M I I I : I I I I II I I I I 
30 orf 30a MKKQITAAVMMLSMIAPAMA NGLONQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 

10 20 30 40 50 60 

orf 30a LX I LGGAAI GMW TQHGFS YATTGRPAS VRDVAIAGGLGAI PGXVGAAGKWS FAKYGRE I 

70~~ 80 90 100 110 120 

35 The complete length ORF30a nucleotide sequence <SEQ ID 1 77> is: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGANA CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

40 201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGN AAATTTCCCC ATTATCATCG TCGAGTTACG 

45 451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 1 78>: 

1 MKKQITAAVM MLSMIAPAMA* NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

50 101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 

or f 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 
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orf30-l 

orf30a.pep 

orf30-l 

orf 30a. pep 

orf30-l 

orf 30a. pep 

orf30-l 



120 



llllltlllllillllllllllinMlllllltiiillllMlllMIIII I 1 1 II I 
mkkqitaavmmlsmiapamangldnqapedqvfhtradapmql^^sqkemketegaflp 60 

LXILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGXVGAAGKWSFAKYGREI 120 

t I I I | 1 | j i | | i | | | l I I I I I I I 1 1 I 1 1 I 1 I I t I I I I I 1 i 1 MMllllMltnlM 
LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 1 80 

IIMHIMMII lllllllMMtMII I Mium HI I M I I t H I I IMHIJ4 , on 
KIGNNMRI APFGNRTGKPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 1 BO 

FX 
I I 
FX 



Homology with a predicted ORF from N gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from N. 



20 



gonorrhoeae: 

orf 30. pep 
orf30ng 



MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 
| | | | | I I 1 I I I I I I 1 1 I I I I I I I 1 I I I 1 t I I - I I I 1 1 I I I t 1 

MKKOITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLF 



42 



60 



The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAAAAAC 
CGCAATGGCA 
ACACGCGGGC 
ATGAAGGAGA 
TGCCATTGGT 
GACCAGCTTC 
GATGTAGGTG 
GATTAAAATC 
GTCATCCTAT 
ACGGGCAAGA 
ATCAAAATCT 



AAATCACCGC 
AACGGATTGG 
AGATGCGCCG 
CTGAAGGGGC 
ATGTGGACAC 
TGTTAGAGAT 
CTGCAGGAAA 
GGCAATAATA 
TGGAAAATTT 
CTTTGCCTGG 
ACGGACAGAT 



AGCCGTAATG 
ACAATCAGGC 
ATGCAGTTGG 
TTTTCTTCCA 
AGCATGGTTT 
GTTGCTGGCG 
GGTTGTTTCC 
TGCGGATAGC 
CCCCATTATC 
ACAGGGAATT 
CATGGAAAAA 



ATGCTGTCTA 
ATTTGAAGAC 
CGGAGCTTTC 
TTGGCTATCT 
TAGTTATGCA 
GATTAGGCGC 
TTTGCTAAAT 
CCCTTTCGGT 
ATCGTCGAGT 
GGTCGTCATC 
CCGCTTCTAA 



TGATCGCCCC 
CAAGTGTTCC 
TCAGAAGGAG 
TGGGTGGTGC 
ACGACAGGCA 
AATTCCTGGT 
ATGGACGTGA 
AATAGAACAG 
TACGGATAAT 
GCCCTTGGGA 



35 



This encodes a protein having amino acid sequence <SEQ ID 180>: 

1 MKKQITAAVM MLSMIAPA MA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGREIKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 



40 
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50 



55 



orf30ng.pep 
orf30-l 

orf 30ng.pep 
orf30-l 

orf30ng.pep 
orf30-l 



10 20 30 40 50 60 

MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

i , 1 1 1 1 1 1 1 1 m n 1 1 m m i u i m 1 1 m 1 1 1 1 n m m i m i m 1 1 1 m 

MKKQITAA\^LSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEb5KETEGAFLP 
10 20 30 40 50 60 

70 80 90 100 110 

LAI LGGAAIGMWTQHGFSYATTGRPAS VRDVA— GGLGAI PGDVGAAGKWS FAKYGRE I 

imjiiiiiiiMiiimiiimimn mum mimmmim 

LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAI PGGVGAAGKWS FAKYGREI 
70 80 90 100 110 120 

120 130 140 150 160 170 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

I I I 1 1 | I I I I I I M I f I I I M I 1 1 I M I t II I I I M M I t I I I I M I 1 I M I M I 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
130 140 150 160 170 180 



60 



180 

orf30ng.pep FX 
I I 

orf30-l FX 
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Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 22 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1 8 1>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg.CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

10 201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT . . 

This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . . 

1 5 Further work revealed a further partial nucleotide sequence <SEQ ID 1 83>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

20 201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT. . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI. . 

Computer analysis of this amino acid sequence gave the following results: 

25 Homology with a predicted ORF from N. gonorrhoeae 

ORF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF31.ng) from N. 
gonorrhoeae: 

orf 31 .pep MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 
I I I I I II I I I I I I I I I I I I I II I I I ! I I I I I I I I I ! : : I I I I I I 1 : : I 

30 orf31ng MNKTLYRVI FNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFI PTH SKAF 54 

orf 31. pep S FS LLG FS LCLAVGTXN I AFADGI 84 

II IIMIIICIl I I I I I I I I 
orf31ng CFS ALG FS LCLALGTVN I AFADG I IT DKAAPKTQQAT I LQTGNG I PQVN I QT PT S AGVS V 114 

35 The complete length ORF3 lng nucleotide sequence <SEQ ID 1 85> is: 

1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

40 201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa CGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

401 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

45 4 51 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
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701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF KRKRGAWAV AETTKREGKS CADSGSGSVY VKSVS FIFTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897): 

0 -f31na 96 GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 45 GNGVPWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

Orf31na 155 ARVWNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGF1NASRATLTTGQPQYQ 214 
or ng A ++N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 

KecA 105 AAAILNEWSPNRSRLAGYLEVAGQAANVWANPYGITCSGCGFLNTPRLTLTTGTPQFD 164 

20 Orf31ng 215 - AGDFSGFKIRQGNAVI AGHGLDARDT DF 242 

AG SG +R G+ +1 G GLDA +D+ 
HecA 165 AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 

Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

?S orf31-l oep MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSOSGSAHVKSVPFGTTHAPVCRSNIFS 

1 | ■ I I | | 1 1 | 1 | | | | | I 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 I I ill::! I M I II 1 : ' 

orf31na MNKTLYRVI FNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFI PTH SKAFC 

° r * io 20 30 40 50 

30 70 80 

or f 31-1. pep FSLLGFSLCLAVGTANIAFADGI 

0 rf31na psaLGFSLCLALCT 

y " 60 70 80 90 100 110 

35 On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 23 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 187>: 

40 1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AAC T CGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG.. 

45 This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 189>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
cq 51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 
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151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATTATCC GCCGACACAA GCCGCTTTGG CTGAATTGGG AATATTTGAG 

5 351 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

401 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 

451 TTGATACGCG AACGTGATTA CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 

501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

10 601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAATCATCGA 

651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG. 

751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

15 851 TTTGGCACAT CTACCCGCAA GACGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 

951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA * 

1001 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 

20 1101 TCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

25 151 LIRERDYCEA VRFDTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR 

201 QAG5PMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETVSAHRR LSDDLNGGEA LSATQRLECW QTLQQHQNGW 

351 RQGAEDWSRY LFGQPSAPEK LAAEVSKHQK IR*w 

30 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted OKF from N.meninpitidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A oiN. 
meningitidis: 

10 20 30 40 50 60 

35 orf 32 .pep MNTPPFVCWIFCKVIDNFGDIGVSWRU^VLHRELGWQVHLWTDDVSALRALCPDLPDVP 

HUN 1 t I I I I I I ! I t I I 1 1 I I I I 1 I I t f 1 I I ! I I I I I I 1 t 1 I t I I I I I t 1 I i I 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 
10 20 30 40 50 60 

40 70 ao 

orf 32 .pep CVHQDIHVRTWHSDAADIDTA 
I I I I I I I I I I I II I II I I I I I 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 

45 The complete length ORF32a nucleotide sequence <SEQ ID 1 9 1 > is: 

1 ATGAATACTC CTCCTTTTTC TGCTGGANTT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

50 201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 

351 CGCGGAGGAN AGCAATGAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

401 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 

55 4 51 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 

501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 

651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

60 701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 
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801 CCGCGGCGAA 

851 TTTGGCACAT 

901 GCCTTTTGGG 

951 ACACCGCCGC 

1001 CACAACGCCT 

1051 CGGCAAGGCG 

1101 ATCCGAAAAA 



GACAGTTTCG 
CTACCCGCAA 
ATAAGGCACA 
CTTTCAGACG 
CGAATGTTGG 
CGGAGGATTG 
CTCGCCGCCT 
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TGCGCGCCCA 
GATGAGAATG 
CGGTTTCTAC 
ACCTCAACGG 
CAAATCCTGC 
GAGCCGTTAT 
TTGTTTCAAA 



GCTTGCGGGC 
TCCATCTCGA 
ACGCCCGAAA 
CGGAGAGGCT 
AACAACATCA 
CTTTTTGGGC 
GCATCAAAAA 



AAACCCTTCT 
CAAACTCCAC 
CCGCATCGGC 
TTATCCGCAA 
AAACGGCTGG 
AGCCTTCCGC 
ATACGCTAG 



This encodes a protein having amino acid sequence <SEQ ID 192>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 



MNTPPFSAGX 
ALCPDLPDVX 
HIIRRHKPLW 
LIRERDYCEA 
QAGSPLTLLL 
PQQDFDKLLH 
AFWDKAHGFY 
RQGAEDWSRY 



FCKVIDNFGD 
CVHQDIHVRT 
LXWEYLSAEX 
VRFDSGALRK 
AGAXIIDSLK 
LADCAVIRGE 
TPETASAHRR 
LFGQPSASEK 



IGVSWRLARV 
WHSDAADIDT 
SNERLHXMPS 
RLMLPEKNXP 
QNGVIPQDAL 
DSFVRAQLAG 
LSDDLNGGEA 
LAAFVSKHQK 



LHRELGWQVH 
APVXDWIET 
PQESVXKXFW 
EWLLFGYRSD 
QNDGDVFQTA 
KPFFWHIYPQ 
LSATQRLECW 
IR* 



LWTDDVSALR 
FACDLPENVL 
FMGFSEXSGG 
VWAKWLEMWR 
SVRLVKIPFV 
DENVHLDKLH 
QILQQHQNGW 



ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



orf32-l.pep 
orf32a 



orf32-l.pep 
orf32a 



orf 32-1. pep 



orf32a 



orf 32-1 .pep 
orf32a 



orf32-l.pep 
orf32a 



orf 32-1 .pep 
orf32a 



orf 32-1. pep 
orf32a 



10 20 30 40 50 60 

MN^PPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVKLWTDDVSALRALCPDLPDVP 

I I I | | | III I i M I I I I M I I I I I I I I I I I I M I I i I I I t I Hi 1 M I I I M I I ! 
MNTPPFSAGXFCKVIDNFGDIGVSWRIARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 
10 20 30 40 50 60 

70 80 90 100 110 120 

CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 

1 1 1 1 1 1 1 ( 1 1 1 1 1 1 1 1 1 1 1 j 1 1 1 iimmiiiiiiimiiimiii iimiii 

CVHODIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 
70 80 90 100 110 120 

130 140 150 160 170 180 

SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 

} | | ] I I I Mlltlll I I 1 I I I I I I I 1 i I I I I I = 1II:IIUII!I 

SNERLHXMPS PQESVXKXFWFMGFSEXSGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 

130 140 150 160 170 180 

190 200 210 220 230 240 

EWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 

M , ! I ! I I M 1 I I I I I I I i t I I I I I : I 1 I 1 I I : I I I I M I : I I I I I M I I I I I M I I M 
EWLLFGYRSDVWAKWLEMWRQAGSPLTLLLAGAXIIDSLKQNGVIPQDALQNDGDVFOTA 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

I | H M I |] | I I I 1 I I : 1 I I I I i I 11 I I I I I I I I II I 1 M I I H I I I I I II I M I I I M 1 
SVRLVKIPFVPQQDr DKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

250 260 270 280 290 300 

310 320 330 340 350 360 

AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 

I I I I I M | | i j I 1 1 : I M I I I I 1 I I I H I t I t I I ! I I! I I I I I II I II I I I M I I I M I 
AFWDKAHGFY? PET ASAHRRLSDDLNGGEALSATQRLECWQILQOHQNGWRQGAEDWSRY 
310 320 330 340 350 360 

370 380 
L FGQ P S APE K LAAFVSKHQK I RX 

IIMIII I I I I M I M I I I I I I 
LFGQPSASEKLAAFVSKHQKIRX 
370 380 



60 Homology with a predicted ORF from N gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 

orf32 Deo MNTPP r -VCW I FCKVIDNFGD I GVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 
" F M| | Hill I Ml IIIMIIII 111 I III! II II II III llll III Ill 
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MVMmAFPVCWlFCKVlDNFGDlGVSWRl^ 



60 
81 



or f 32 . pep DVPCVHQDIHVRTWHSDAADIDTA 
5 orf32ng DVPFVHQDI^ 120 

An ORF32ng nucleotide sequence <SEQ ID 1 93> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 

1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

10 101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGDVWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

15 Further sequencing revealed the following DNA sequence <SEQ ID 195>: 

1 ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

20 201 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 

251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

351 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

401 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 

25 451 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 

501 AGCCCTGCGC CGGCGGCTGG TGCTGCCCGA AAAAAACGCC CCCGAATGGC 

551 TGCTTTTCGG CTATCGGGGC GATGTTTGGG CAAAGTGGCT GGACATGTGG 

601 CAACAGGCAG GCAGCCTGAT GACCCTACTG CTGGCGGGGG CGCAAATTAT 

651 CGACAGCCTC AAACAAAGCG GCGTTATTCC GCAAAACGCC CTGCAAAAtg 

30 701 aaggcgGTGT CTTTCagacG gcatccgTcC gccttGTCAA AAtcCCGTTC 

751 GTGCcGCAAC AGGAcTTCGA CAAATTGCTG CAcctcgcCG ACTGCGCCGT 

801 GATACGCGGC GAAGACAGTT TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 

851 TTTTTTGGCA CATCTACCCG CAAGACGAGA ATGTCCATCT CGACAAACTC 

901 CACGCCTTTT GGGATAAGGC ATACGGCTTC TACACGCCCG AAACCGCATC 

35 95i GGTGCACCGC CTCCTTTCGG ACGACCTCAA CGGCGGAGAG GCTTTATCCG 

1001 CAACACAACG CCTCGAATGT TGGCAAACCC TGCAACAACA TCAAAACGGC 

1051 TGGCGGCAAG GCGCGGAGGA TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 

1101 CGCATCCGAA AAACTCGCCG CCTTTGTTTC AAAGCATCAA AAAATACGCT 

1151 AG 

40 This encodes a protein having amino acid sequence <SEQ ID 196; ORF32ng-l>: 

1 MNTYAFPVCW IFCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

101 LNIIRRHKPL WLNWEYLSAE ESNERLHLMP SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

45 201 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR* LLSDDLNGGE ALSATQRLEC WQTLQQHQNG 

351 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 

50 10 20 30 40 50 59 

orf3 2-l pep MOTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
lt| | | | | M M i 1 M I I I I I 1 I I I 1 1 1 I 1 I I I M I 1 1 I M M M I M ! I I fl ! 1 I L 1 I 
orf32ng-l MNTYAFPVCTIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

10 20 30 40 50 60 

55 60 70 80 90 100 110 119 

orf32-l pep PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAE 
| | | | | | 1 I I I 1 1 I I J I I I t I I 1 1 I I i 1 I I I t I I 1 1 I 1 I t I t I I 1 ! 1 1 I I I t I t I I I I I t 
orf32ng-l prVHQDIHTOTWHSDAADIDTAPVPDAVIETFACDLPENV^IIRRHKPLWLNWEYLSAE 

50 ^ ' ' 70 80 90 100 110 120 

120 130 140 150 160 170 179 
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orf 32-1. pep ESNERLHIilPSPQEGVQKYFWFMGFSEKSGGLl^^YCEAVRFDTEALRE^^^^^^ 
| | | | | | j | | II I I i II M I I II I | | | | I I I I I II I * I I I I I I I I I I I I I : I I : I I * 1 1 ' 

orf32ng-i esnerlhikpspqegvqkyfwe^gfseksgglirerdyreavrfdtealrrrlvlpekna 

130 140 150 160 170 180 

5 180 190 200 210 220 230 239 

orf 32-1. pep sewllfgyrsdvwakwlemwrqagspmtlllagtqiidslkqsgvipqdalqndgdvfqt 

M mi ||:illll I is I I: Mil I I I I I I I : I I I I I I I I I I I I I I : I I I I : I HI I 

orf32ng-i pewllfgyrgdvwakwldmwoqagslmtlllagaqiidslkqsgvipqnalqkeggvfqt 

10 " 190 200 210 220 230 240 

240 250 260 270 280 290 299 

or f 32- 1 . pep ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHI YPQDENVHLDKL 
I | | M | | || | I II II I I : I II I I I I I I 1 1 1 I I 1 1 1 1 : I I M II I I I I I I M I I I 1 1 i I I I 
15 orf32nq-l ASVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRTQLAGKPFFWHIYPQDENVHLDKL 

250 260 270 280 290 300 

300 310 320 330 340 350 359 

erf 32-1 pep HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
20 I I I I II I : i I i I I I I : I : I i I I I M I 1 1 I I I I I I M I I I I I 1 I I I I I I I I I I I I I I I I I 

or f 32ng-l HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 

310 320 330 340 350 360 

360 370 380 

25 orf 32-1. pep YLFGQPSAPEKLAAFVSKHQKIRX 

I I I I I M I I I I I I I I I I I M I I I 
orf32ng-l YLFGQPSASEKLAAFVSKHQK IRX 

370 380 

30 On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
35 7 A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 24 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 197>: 

1 . .TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

45 201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

251 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

401 TCGCCTGCTA NGGCATCCTG CCGCGCCTG. . 

50 This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 

1 . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL. . 



8NSOOCO <WO 992A57BA2_I.> 



WO 99/24578 



PCT/IB98/0166S 



-156- 



Further work revealed the complete nucleotide sequence <SEQ ID 199>: 



10 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGTTGAATC 

AGGCGGTTTT 

GCCGCGTGGA 

ATTGACAGGA 

GTCGTTCTGG 

TTTCAGTCAC 

GTTTTGGCGG 

GGCAATGTTG 

CGACGTGGTT 

TATGCGGACG 

GTCGCACAGC 

TGTTGCTGCT 

TTGAGCAATG 

GTCGAAACTC 

GTCTGAACGG 

GGCAGTATCG 

GTGTAAAATC 

CCTATTATCA 

GATACGCGTC 

CGATGCGCCG 

AATGGTTCGA 

ACCAATCGGG 

GGCGCAACTG 

TGTTGCGGCA 

GTGCAGCTTT 

GGAACATTGG 

CTGACAGGGC 



CATCCCGAAA 

ATTTTCAGCG 

CGGCAGTACG 

ACCGTATGCT 

TTGTGGGTGG 

TTATCTTCTA 

GCGTGTTGGG 

TTCCTGCGTG 

TCGGGGCAAA 

AGTGGCGGCA 

CTGTGGCTCT 

TTTGGTGCGG 

CCGCTTCGGT 

GGTTTCCCTG 

CAATATTGCC 

CCTGCTACGG 

CTTTTGAAAA 

GGCGGTCATC 

GGGAAACCGT 

AAATGGGCGG 

GGGCAGGCTG 

AACAGGTTGC 

CTTATCGGCG 

GATTGTCCGA 

TGGCGGAACA 

CGTAACGCGC 

GGCGCAGGAA 



ACTGGTTGAG 
GCGATCCCGT 
GAGGAAAAAA 
GCGGGAGACG 
TGGCGGCGAC 
ATGGACAATC 
CATGAATACG 
TGAAAGTGGG 
GACCCTGTAA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 
CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TGCGCGCCCA 
CTCTCGGAAG 
GGGGCTTTCA 
TGGCCGAATG 
GGGCGTTTGA 



CTGGTCCGTA 

ACAGGCGACG 

TCATCCGTCG 

TTGGAACGTG 

GTTTGCATTT 

AGGGTCTGAA 

CTGATGCTGG 

GCGTTTTTTC 

ATCAGGCGGT 

CGTTGGAAAA 

CGGAATGCTG 

TCAACTGGGA 

GAAATGTTGG 

GCGGGCGGTC 

CTTGGTCGGG 

CGCCTGCTGG 

CGGATTGGAT 

AGAACAAAAT 

TCACCGAAAA 

GACCGAGTGG 

GGCTGGATAA 

ACAGAGCTGA 

AACTGTGCCG 

CGGCGCAGGG 

GACGACCTTT 

CGGCGCGGCG 

AAGACCAATA 



TTTTGGACGA 

GAGGCTTTGC 

GGCGGAGATG 

TGCGTGCGGG 

TTTACCGGTT 

TTTCTTTTTG 

CAGTATGGTT 

AGCAGTCCGG 

GTTGCGGCTG 

TAGGCGCAAC 

GTGTCGGTAT 

AAGCACGCTG 

CATGGCTGCC 

ATCGAAGGCC 

GCTGCTGGTC 

CTTGGGTAGT 

TTGGAAAAGC 

CACCGATGCG 

TCATCTTGAA 

CAGGACGGCG 

GGGCGTTGCC 

AGCAGAAACC 

GACCGCGGCG 

CGGCGCGGTG 

CGGAAAAGCT 

TGGCTTGAGC 

A 



This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 



30 



35 



1 MLNPSRKLVE 

51 IDRNRMLRET 

101 VLAGVLGMNT 

151 YADEWRQPSV 

201 LSNAASVRAV 

251 GSIACYGILP 

301 DTRRETVSAV 

351 TNREQVAALE 

401 VQLLAEQGLS 



LVRILDEGGF 
LERVRAGSFW 
LMLAVWLAML 



IFSGDPVQAT 
LWWAATFAF 



EALRRVDGST 
FTGFSVTYLL 



RWKIGATSHS 
EMLAWLPSKL 
RLLAWWCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



SPKIILNDAP 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LLKTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALAECGAA 



IEGRLNGNIA 
LEKPYYQAVI 
QDGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAEM 
MDNQGLNFFL 
DPVKQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKDQ* 



Computer analysis of this amino acid sequence gave the following results: 

40 Homology with a predicted ORF from ^menin gitidis ( strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of N. 



45 



50 



55 



60 



meningitidis: 



orf33 .pep 
orf33a 



orf33.pep 
orf 33a 



orf 33. pep 
orf33a 



orf33a 



10 20 30 

LFLRVKVGRFFSS PATW FRXKD PVNQAVLR 
t I | | | | | ! | II I I I I I I I I I I I I I ! II I I 
I^DNOGLN FFLVIAGVXGMNTLMLAVW LAMLFLRVKVGRFFSS PATW FRGKD PVNQAVLR 
90 100 110 120 130 140 

40 50 60 70 80 90 

T.YynFWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLSNAASVRA 
|| MMI HI III MIIII!illl!tMlllllllMMIII!llllll::::m 
T .V anEWRX P S VRWK I GAT 5 HS LW LCT L LGMLV SVLLLLLV RQ YT FNWE S T LLG PS S S VRL 
150 160 170 180 190 200 

100 110 120 130 140 

\rPMi .&ut .P SKT ^FPVPDARSVI EGRLNGN I ADARAWSG LLVXS IACXGI L PRL 

| | | 1 | | I t r | 1 1 t I I 1 I I 1 - t I I I I I 1 f I I 1 I I 1 I I t 1 I I I (ill WWW 
\7PMT.awT.patCT/;rPVPPARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLA WAVCK 

210 220 230 240 250 260 

ILXXTSENGLDLEKXXXXXXIRRWQNKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 
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The complete length ORF33a nucleotide sequence <SEQ ID 201> is: 
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15 



20 



25 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
-701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGTTGAATC 
AGGCGGCTTT 
GCCGCGTGGA 
ATCGACAGGA 
GTCGTTCTGG 
TTTCAGTTAC 
GTTTTGGCGG 
GGCAATGTTG 
CGACGTGGTT 
TATGCGGACG 
GTCGCACAGC 
TGTTGCTGCT 
TTGGGCGATT 
TGCGAAACTG 
GTCTGAACGG 
GGCAGTATCG 
ATGCAAAATC 
NCNNNNNTCN 
GATACGCGTC 
CGATGCGCCG 
AATGGTTCGA 
GCCAATCGGG 
GGCGCAACTG 
TGTTGCGGCA 
GTGCANCTTT 
GGAACATTGG 
CCGACAGAGC 



CATCCCGAAA 
ATTTTCAGCG 
CGGCAGTACG 
ACCGTATGCT 
TTGTGGGTGG 
TTATCTTCTA 
GCGTGNTGGG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCN 
CTGTGGCTCT 
TTTGGTGCGG 
CGTCTTCGGT 
GGTTTTCCCG 
CAATATTGCC 
CCTGCTACGG 
CTTNTGNAAA 
NNCGNTCATC 
GGGAAACCGT 
AAATGGGCGG 
GGGCAGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GATCGTCCGA 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



ACTGGTTGAG 
GCGATCCCGT 
GAGGAAAAAA 
GCGGGAGACG 
CGGCGGCGAC 
ATGGACAATC 
CATGAATACG 
TGAAAGTGGG 
GACCCTGTCA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGGCTGGTG 
TGCCTGATGC 
GATGCGCGGG 
CATCCTGCCG 
CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TGCGCGCCCA 
CTTTCGGAAG 
GGGGCTTTCA 
TGACCGAATG 
GGCCGTCTGA 



CTGGTCCGTA 
GCAGGCGACG 
TCATCCGTCG 
TTGGAACGTG 
GTTTGCGTTT 
AGGGTCTGAA 
CTGATGCTGG 
GCGT7TTTTC 
ATCAGGCGGT 
CGTTGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 
CGGCTTGGAT 
AGAACAAAAT 
TCGCCGAAAA 
GACCGAATGG 
GGCTGGATAA 
ACAGAGCTGA 
AACTGTGCCC 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAACCAACGA 



TTTTGGAAGA 
GAGGCTTTGC 
GGCGAAGATG 
TGCGTGCGGG 
NTTACCGNTT 
TTTCTTTTTG 
CAGTATGGTT 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGCGGT 
TTGGAAAAGC 
CACCGATGCG 
TCGTCTTGAA 
CAGGACGGCG 
GGGCGTTG1CC 
AGCAGAAACC 
GACCGCGGCG 
CGGCGCGGTG 
CGGAAAAGCT 
TGGCTGGAAC 
CCGCACTTGA 



This encodes a protein having amino acid sequence <SEQ ID 202>: 
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35 



1 MLNPSRKLVE 

51 IDRNRMLRET 

101 VLAGVXGMNT 

151 YADEWRXPSV 

201 LGDSSSVRLV 

251 GSIACYGILP 

301 DTRRETVSAV 

351 ANREQVAALE 

401 VXLLAEQGLS 



LVRILEEGGF 
LERVRAGSFW 
LMLAVWLAML 



IFSGDPVQAT 
LWVAAATFAF 



EALRRVDGST 
XTXFSVTYLL 



RWKIGATSHS 
EMLAWLPAKL 
RLLAWAVCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



SPKIVLNDAF 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LXXTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALTECGAA 



IEGRLNGNIA 
LEKXXXXXXI 
QDGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAKM 
MDNQGLNFFL 
DPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSSAAQGGAV 
GRLKTNDRT* 



ORF33a and ORF33-1 show 94.1% identity in 444 aa overlap: 
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orf33a.pep 
orf33-l 

orf 33a. pep 
orf33-l 

orf 33a. pep 
orf33-l 

orf33a.pep 
orf33-l 

orf 33a. pep 



10 20 30 40 50 60 

MLN PSRKLVE LVRI LEEGGFI FSGDPVQATEALRRVDGSTEEKI IRRAKMI DRNRMLRET 
|||i|||tillllll:IMMlttinitli!IMI>IIIMIIllll:llilliilllt 
MLN P S RKLVE L VR I L DEGG F I FSG D P V Q AT E ALRR VDG S T EE K 1 1 RRAEM I DRN RM LRET 

10 20 30 40 50 60 

70 80 90 100 110 120 

LERVRAGSFW LWVAAATFAFXTXFSVTYLLMDNQGLNFFLVLAGVXGMNTLMLAVWLAML 

I t | | | | | | | | | | !: I I 1 I I I I mHillllillllllMIII MIIIIMimi! 
LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVIAGVI^MOTI^IAVWLAML 
70 80 90 100 110 120 

130 140 150 160 170 180 

FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 

iiiniiiiiimiiiiiMMMMiiimiii 1 1 1 1 1 1 1 1 1 n 1 11 1 11 1 1 11 1 1 

FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
130 140 150 16C 170 180 

190 200 210 220 230 240 

VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 

MM1imimilini1l::::lll I I I I I I I I : I i I 1 II I I I I I! I ! I I I I I I I I 
VSVLUJ-LVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 
190 200 210 220 230 240 

250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : i i 1 1 1 miiiim milium 
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15 



20 
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orf33-l 

orf 33a. pep 
orf33-l 

orf33a.pep 
orf33-l 

orf33a.pep 
orf33-l 
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DARAWSGLLVGSIACYGILPRLLAWVVCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 



250 



260 



270 



280 



290 



300 



310 320 330 340 350 360 

DTRRETVSAVSPKI\O^DAPKWAVMI^TEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 

I 1 I 1 I I 1 i|:|M||||ltlllMltlMllltllMMMMII:llllillll 

DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRIAQEWLDKGVATNREQVAALE 

310 320 33 0 340 350 360 

370 380 390 400 410 420 

TELKOKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 

I I I I I i t I I I I 1 I g I I | I | | I I 1 I I I I I I I 1 I 1 I I I I I I I 1 I I I I 1 I I 1 I I I I I I I I I I 
TELKQKPAQLLIGVRACJTVPDRGVLRQIWLSEAAQGGAVVQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 

430 440 450 

RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

III hlillMI llltllllllM 

RNALAECGAAWLEPDRAAQEGRLKDQX 
430 440 



Homology with a predicted ORF from N.zonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from N. 



25 



30 



35 



gonorrhoeae: 

orf 33. pep 
orf33ng 
orf 33. pep 
orf33ng 
orf 33. pep 
orf 33ng 

An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 



LFLRVKVGRFFS S PATW FRXKDP VNQAVLR 3 0 
I 1 K I I t M I M I I 1 I I I I I I lllltltl 

I^DNQGl^FFLVIAGVU^TI^LAVWIATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 100 

LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 90 

|| |: | | MINI I i : I II II I I 1 I 11 I I I I I t I 11 I 11 1 I I 1 t I I M I I I I 1 1 I I i 

LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 14 3 

1 f t I 1 I I I I I I I I I I t I I f : 1 1 I 1 I t 1 If ! t I I I 1 t 1 1 I I f 1.1:1 1(1111 

VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 220 



acid sequence <SEQ ID 204>: 
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l 

51 
101 
151 
201 
251 
301 
351 



MIDRDRMLRD 
LVLAGVLGMN 



TLERVRAGSF 
TLMLAVWLAT 



WLWVWASMM FTAGFSGTYL 



LYADQWRQPS 
LLSNAASVRA 



VRWKIGATAH 
VEMLAWLPSK 



LFLRVKVGRF 
SLWLCTLLGM 



FSSPATWFRG 
LVSVLLLLLV 



VGSIVCYGIL 
ADTRRETVSA 
AANREQVAAL 
WQLLAEQGL 



PRLLAWWCK 
VSPKIVLNDA 
ETELKQKPAQ 
SDDLSEKLEH 



LGFPVPDARA 
ILLKTSENGL 
PKWALMLETE 
LLIGVRAQTV 
WRNALTECGA 



VIEGRLNGNI 
DLEKTYYQAV 
WQDGQWFEGR 
PDRGVLRQIV 
AWLEPDRVAQ 



LMDNQGLNFF 
KGPVNQAVLR 
RQYTFNWEST 
ADARAWSGLL 
IRRWQNKITD 
LAQEWLDKGV 
RLSEAAQGGA 
EGRLKDQ* 



Further sequence analysis revealed the following DNA sequence <SEQ ID 205>: 



50 



55 



60 



1 ATGTTGaatC 

51 agggggtTTT 

101 gccgcgtgga 

151 atcgACAGGg 

201 gtcgtTctgG 

251 TTTCAGgcac 

301 GTTTTggcgG 

351 gGCAACGTTG 

401 CGACGTGGTT 

4 51 TATGCGGACC 

501 GGCGCACAGC 

551 TGCTGCTGCT 

601 TTG AG CAATG 

651 GTCGAAACTC 

701 GTCTGAACGG 

751 GGCAGTATCG 



CATCCCgaAA 
attttcagcg 
cggcAGTACG 
accgtatgtt 
TTATGGGTGG 
ttatCttCTG 
GAGTGTtggG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCA 
TTGTGGCTCT 
TTTGGTGCGG 
CCGCTTCGGT 
GGTTTCCCTG 
CAATATTGCC 
TCTGCTACGG 



ACTGgttgag 
gcgatcctgt 
GAggAaaaaa 
gcgggACaCg 
TggtggCAtC 
ATGGACaatC 
CATGaatacG 
TGAAAGTGGG 
GGCCCTGTAA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 



ctGgTCCgtA 
gcaggcgacg 
tcttccgtcg 
TtggaacGTG 
gATGATGTtt 
AGGGGCtGAA 
CtgATGCTGG 
ACGGTTTTTC 
ATCAGGCGGT 
CGATGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 



Ttttgaataa 
gaggctttgc 
GGCGGAGAtg 
TGCGTGCggg 
aCCGCCGGAT 
TtTCTTTTTA 
CAGTATGGtt 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGTAGT 
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10 



801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



GTGTAAAATC 
CCTATTATCA 
GATACGCGTC 
CGATGCGCCG 
AATGGTTCGA 
GCCAATCGGG 
GGCGCAACTG 
TGCTGCGGCA 
GTGCAGCTTT 
GGAACATTGG 
CTGACAGGGT 



CTTTTGAAAA 
GGCGGTCATC 
GGGAAACCGT 
AAATGGGCGC 
GGGCAGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GATTGTGCGG 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TACGCGCCCA 
CTTTCGGAAG 
GGGGCTTTCA 
TGACCGAATG 
GGCCGTTTGA 



CGGattgGAT 
AGAACAAAAT 
TCGCcgaAAA 
GACCGAGTGG 
GGCTGGATAA 
ACAGAGCTGA 
AACTGTGCCG 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAGACCAATA 



TTGGAAAAAA 
CACCGATGCG 
TCGTCTTGAA 
CAGGACGGCC 
GGGCGTTGCC 
AGCAGAAACC 
GACCGGGGCG 
CGGCGCGGTG 
CGGAAAAGCT 
TGGCTTGAGC 
A 



This encodes a protein having amino acid sequence <SEQ ID 206; ORF33ng-l>: 
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51 
101 
151 
201 
251 
301 
351 
401 



MLNPSRKLVE 
IDRDRMLRDT 
VLAGVLGMNT 



LVRILNKGGF 
LERVRAGSFW 
LMLAVWLATL 



IFSGDPVQAT 
LWWVASMMF 



EALRRVDGST 
TAGFSGTYLL 



YADQWRQPSV 
LSNAASVRAV 
GSIVCYGILP 



RWKIGATAHS 
EMLAWLPSKL 
RLLAWWCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



DTRRETVSAV 
ANREQVAALE 
VQLLAEQGLS 



SPKIVLNDAP 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LLKTSENGLD 
KWALMLETEW 
LIGVRAQTVP 
RNALTECGAA 



IEGRLNGNIA 
LEKTYYQAVI 
QDGQWFEGRL 
DRGVLRQIVR 
WLEPDRVAQE 



EEKIFKRAEM 
MDNQGLNFFL 
GPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKDQ* 



ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orf 33-1 .pep 
orf33ng-l 

orf 33-1. pep 
orf33ng-l 

orf33-i.pep 
orf33ng-l 

orf 33-1. pep 
orf 33ng-l 

orf 33-1. pep 
orf33ng-l 

orf33-l.pep 
orf33ng-l 

orf 33-1 -pep 
orf33ng-l 



10 20 30 40 50 60 

MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 

I 1 1 1 1 1 | I I I II M I :: I I I I M I I I I I I 1 1 I II I M Ml ! H I • N I I I I I I : I 1 1 I : I 
MLNPSRKLVE LVRILNKGGFI FSGDPVQATEALRRVDGSTEEKI FRRAEMI DRDRMLRDT 
10 20 30 40 50 60 

70 80 90 100 110 120 

LERVRAGS FWLWWAAT FAFFTG FSVT YLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 
| | ; || I Mil I I; i:! :: I : 1 I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I I I I I i M ! 
LERVRAGSFWLWVWASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 

70 80 90 100 110 120 

130 140 150 160 170 180 

FLRVKVGRFFS SPATW FRGKDPVNQAVLRLYADEWRQPSVRWKI GATS HS LWLCTLLGML 
; | | | | i | U | I | ) I II I I I I M I I M I M I I I : I I I I I M I I I I I I : I I I I I t I i I I I I 
FLRVKVGRFFS S PAT W FRGKG PVNQAVLRLY ADQWRQPSVRWK I G AT AHS LW LCT LLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAV IEGRLNGNIA 

1 | 1 | | t S I t I I I 1 I 1 t 1 I 1 1 I I I 1 I I 1 I I 1 I I I » I I I I I 1 I t ■ t 1 I IMIIMM 

VSVLLLLLVRQYTFNWE ST LLSNAASVRAVEMLAWLPSKLGFPVPDARAV IEGRLNGNIA 

190 200 210 220 230 240 

250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

i 1 1 I I I I I | I | I I = I I I I I 1 I I I 1 I 1 1 I I 1 I I I 1 1 I I t I I 1 I I I I M I I I I I I I M I I I 
DARAWSGLLVGSIVCYGILPRLLAWWCKILLKTSENGLDLEKTYYOAVI RRWQNKITDA 
250 260 270 280 290 300 

310 320 330 340 350 360 

DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

mint iiiiiii:in ill 1 1 1 1 i 1 1 1 1 1 1 1 = 1 1 i 1 1 1 1 1 1 1 1 iiiimiiiiini 

DTRRETVSAVSPKIVLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 
310 320 330 340 350 360 

370 380 390 400 410 420 

TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

1 U | I 1 1 I I I I I I I I I I I I I I II I I M I I I I I I I I I 1 I I I I I I Illlllllil 

TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 



430 440 
orf 33-1 - pep RNALAECGAAWLEPDRAAQEGRLKDQX 
|| | |: I I I II I I I I I I : I I I I M I I II 
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Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
5 predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 207>: 

1 ..CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 

10 51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT.GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC.GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG . . GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

15 301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGTT CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 1 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

4 51 GTCC 

This corresponds to the amino acid sequence <SEQ ED 208; ORF34>: 

20 1 . .QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 

51 GSTGVSLSVF SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 
101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA'RVLS 
151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 

25 1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

30 251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

4 01 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

451 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

35 501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

701 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

40 751 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

45 1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

50 1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 
1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 

1 MMMPFIMLPW IAGVPA VPGQ NRLSR ISLWG LGGVFFGVSG LVW FSLGVSL 

55 51 GCACFSGV SF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SVAVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 
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201 SLKGLFGFFA 

251 GADFLGNLRL 

301 SVAGDVAGSA 

351 AWADDGDLG 

401 RADGGASDYC 

451 HAV* 



ILIVLLGCRA MPSEGGSDGI AESALDWLV 
FFGGEDAHNV GYVAVGNDFD ARLCGGADAQ 
RQGGDGNIW RAFGGLFGTC NLTDELFFAF 
RVAF GLWLA OlGTGGGF DT QRHNVWGLR 
ADAAAKGKAE NGGNUOADGV RFGFHRVLPF 



EGDDFLYADG 
QRGADFGCVP 
GGDL5EQQ0V 
AGGSAVDGGF 
LGVSDGIALR 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from hJ meningitidis (strain 

ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of N. 



10 meningitidis: 
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orf 34 .pep 
orf34a 

orf 34 .pep 
orf34a 

orf 34. pep 
orf34a 



10 20 30 
OKSLSR ISLWGLGGVFPGVSGLyW FSLG VSXE CAC 

1 | III I I I I I I I I I I I t I I I I I I I I I 1 I Mi 
MMXPXIM LPWIAGVPA VPGQKRLS RXSLWGLGGXFFGVSGLVW FSLGVSXSLGVSXGCAC 

10 20 30 40 50 60 

40 50 60 70 80 90 
F5;r:V5;FRGSGRG TFVGSTGVSLSVFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

TTTTl I I I I I M I M I It I I I I M I I M : I:: III Ml 

r^nvqFRGSGRGT FVGSTGVSLSVFSACA PASSGCLSVXAVSAGCGLTRXFXGA 

70 80 90 100 110 

100 110 120 130 140 150 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

,,, (ill Mill 111:11 I M IN I Mill I Ull I Ml II I I Mill: MM 
AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 

130 140 150 160 170 



120 



30 orf 34. pep S 

orf 34a PFGXNVI TMPTaiJftPMAVTOMSMTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 

05 i ATGATGATNC CGTTNATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAG AAGAGGTTGT CGAGAANTTC TTTATGGGGT TTAGGCGGCN 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTNTT 

15i TCTTTGGGTG TTTCTNTGGG CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA CGGGGGACGT TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 

4Q 251 TGTTTTCAGC TTGTGCTCCG GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 

301 GTGTCGGCAG GTTGCGGTTT GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 

401 ATGAGGAGGC GTNGTNGTGT TCGGGTTGGG CGGCATCTTG TCCGACTACG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

4< 501 TTCGGTNTGG AGGGTTTTGT CNCCGTTCGG GTNGAATGTG CTGACGATGC 

5=.l CTATTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 

65^ TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTNGGTAGA GGGTGATGAC 

50 751 TTTTTGTACG CCGACGGTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACGTTGCC GTAGGTAACG 

851 ATTTTGACGC GCGCCTGTGT GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 

901 GACTTTGGAT GTGTTCCAAG TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TANTTGTACA CGCCTTCGGC GGCCTGTTCG 

55 10 01 GAACGTGCAA TCTGACCGAC GAACTGTTTC TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACAACGGAG ATTTGGGGCG 

1101 TGTANCCTTT GGTTTGGTTG TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 

1151 TCGATACGCA GCGCCATTAC GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 

1201 GCGGTCGACG GCGGATTTCG CGCCGACCGC CGCGCCGCCG ACGACTGCGC 

60 1251 TGACGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 

1301 ACGGTGTGCG GTTTGGGTTT CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 212>: 



10 



1 MMXPXIMLPW IAGVPA VPGQ 

51 slgvsxgcac~fsgv sfrgsg 

101 VSAGCGLTRX FXGAAGDGSP 

151 PFGSQNSVSR GLSVCCGSVW 

201 -TW^vzx.KG LFXFFAILIV 

251 FLYADGGADF LGNLRLFFGG 

301 DFGCVPSVAG DVAGSARQGG 

351 SEQQQVAWA DNGPLGRVXF 

401 AVDGGFRADR RAADDCADAA 

451 DGIALRHAV* 



KPT.QPVQTMr: T.SGXFFGVSG 
RRTPw^grrrtw ^LSVFSACAP 
LPLSSVPSGC AGADEEAXXC 
RVLSPFGXNV LTMPIANAPM 
LLGCRAMPSE GGSDGIAESA 
EDAHNVGYVA VGNDFDARLC 
DGNVXVHAFG GLFGTCNLTD 
GLWLAOIGA GGGF DTQRHY 
AEGKAEDGGS QGADGVRFGF 



LVWFSLGVSX 
ASSGCLSVXA 
SGWAASCPTT 
AVIQMSNTAR 
LDWXVEGDD 
GGADAQQRGA 
ELFLAFGGDL 
VWGXRAGGS 
HRVLPFLGVS 



ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 
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orf34-l 

orf 34a. pep 
orf34-l 

orf 34 a. pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf34a.pep 
orf34-l 



10 20 30 40 50 60 

MMXPXIHLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 

I 1 I I I i I 1 1 1 I I I I : t 1 1 t MIMII IIIIMIMIilill I H I 



II I I L 

MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL- 
10 20 30 40 50 



-GCAC 



70 80 90 100 110 120 

FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRX FXGAAGDGSP 

1 1 1 t | I t I k I I I I I I I I I I I f i t t I I 1 I : I 1 I I t 1 I 1 I I I i I I I t i I 1 I i IIIHIII 
FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
60 70 80 90 100 110 

130 140 150 160 170 180 

LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 

|||||||lllll:ll I mmilltilllMMMMMIIMI: 1111(11 II 
LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 

120 130 140 150 160 170 

/ 

190 200 210 220 230 240 

LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 

| | | t I I I I f I 1 : I I I 1 I I 1 1 I I 1 I I i I t I 1 1 t I I I I I I I I I I I I I I I I I I I I I I M I I I 
LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

180 190 200 210 220 230 

250 260 270 280 290 300 

LDWXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
UN | | | t k I 1 I I I 1 1 1 t 1 i I ! 1 I I I I t i I I I E I I 1 I t 1 I 1 I 1 I I 1 I I I I I 1 I I ! 1 I M 
LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
240 250 260 270 280 290 

310 320 330 340 350 360 

DFGCV PSVAG DVAGSARQGG DGNVXVHA FGGL FGTCNLT DELFLAFGGDL SEQQQVAWA 

MlllllillllllilMIIIM: M I M I I I I I I 1 I I M I I : M I I I I I I M III I t I 
DFGCVPSVAGDVAGSARQGGDGKIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
300 310 320 330 340 350 

370 380 390 400 410 420 

DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 
| : | | | | | I | 1 1 I 1 1 I I I I : I I I II II I I I I I I I I M M I I I I I I I I I : I Mill 
DDG DLGRVAFGL WLAQI GTGGGFDTQRHNVWG LRAGG S AVDGG FRADGG AS DYCADAA 
360 370 380 390 400 410 

430 440 450 . 460 

AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
I : III I : I I : I M I III I M I I I I I II I III I I I M I I I I 
AKGKAENGGNQGADGVRFGFHRVLPFLGVS DG I ALRHAVX 
420 430 440 450 



Homology with a predicted ORF from N gonorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from N. 



gonorrhoeae: 

orf 3 4. pep 



QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE- 
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orf34ng 
orf34.pep 



II till I MM: | ; N Ml |l 111 I IN I Ml 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGIAGVFFGVSGLWFSLGVSFSLGVSLGCAC 60 

FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 90 

Ml Ml Mil | : | | M M I I I I I I I I I ! Ml: I : M 111111 * * 



orf34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVP VPVNESAARAASEGR GLTRFFLGA 114 

orf34 oeo AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 

P P Ml | | | | M II 1 1 M II II I I I H M I M M I M I H I I I 1 1 II M : M N 

orf34ng AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 17 4 



175 



orf34.pep S 

orf34ng PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 

The complete length ORF34ng nucleotide sequence <SEQ ID 213> is: 

1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAA AAGAGGTTGT CGAGAATCTC TTTATGGGGT TTGGCCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 

•51 TCTTTGGGTG TTTCTTTGGG CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA TGGGGGGCGT TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGTTCCG GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 

301 GCATCCGAAG GGCGCGGTTT gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 

4 01 ATGAGGCGGC GTGGTGGTGT TCGGGTTGGG CGGCATCTTG TCCGACGGCG 

4 51 CCG^TTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTTTGG AGGGTTTTGT CGCCGTTCGG GTTGAATGTG CTGACGATGC 

551 CTACTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTTGGTAGA GGGTAATGAC 

7-1 TTTTTGTACG CCGAcggTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACATTGCC GTAGGTAATG 

851 ATTTTGACGC GCGCCTGTGT AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 

90 ■< GACTTTGGAC GTGTTCCAAG TGTCGCCGGC GATG7CGCCC GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTT TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACGACGGAG ATTTGGGGCG 

1101 TGTAGCCTTT GGTTTGGTTG TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 

1151 TCGATACGCA ACGCCATAAC GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 

1201 gCGGTCGATG ACGGATTTTG CGCCGACGGC GGCCCCGCCG ACGACTGCGC 

1251 TGAAGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 

1301 ACGGTGTGTG GTTTGGGTTT CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 214>: 

1 MMMPFTMLPW IAGVPA VPGQ KRLSR ISLWG LAGVFFGVSG LVW FSLGVSF 

51 SXXS VSLGCAC FSGV SFRGSG WG AFVGSTGV SLSVFSACV P VPVNESAARA 

10 1 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

201 IRSLG VSLKG LFGFFAILIV LL GCRAMPSE GGSDG1AESA LDWLVEGND 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNVWYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQOVAWA DDGDLGR VAF GLWLAQVGT GGGF DTQRHH WIGLRAGGS 

401 AVDDGFCADG GPADDCAEAA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 

10 20 30 40 4 50 

orf34-l pep mmmPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

MIMMMMMMMMIM I M M I M I M M I M M M M M I M IMM 
orf34no mmmpfiMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 
y " 10 20 30 40 50 60 

60 70 80 90 100 110 

orf34-l pep FS GVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
I I | | | M I M I I I I I I I I I I I I i I I I 1 I : : : I M I M M M M M M M I 
orf34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
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orf34-l.pep 
orf34ng 

orf 34-1 .pep 
orf34ng 

orf34-l.pep 
or£34ng 

orf34-l.pep 
orf34ng 

orf34-l.pep 
orf34ng 



120 130 140 150 160 

LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 

I | I I I I I I I II I I I I | | | | | | | | | I I I I I : I M I Hill I I I I II I M : MIIMIIII 
LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 

130 140 150 160 170 180 

180 190 200 210 220 230 . 

LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

llll I I [ I I I : I I I I t I i t I I I I I I I I I M M II I I || | | | I I I I I II I I I I I I II I I i 
LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
190 200 210 220 230 240 

240 250 260 270 280 290 

LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
I I I I I I I I: I I I I I I I I I I I I I I i H It 11 I I M I II I : M I I M I I I I I : I I I I I I 1 I I 
LDWLVEGNDFLYADGGADFLGNLRLFFGGEDAHNVGYIAVGNDFDARLCSGADAQQRGA 
250 260 270 280 290 300 

300 310 320 330 340 350 

DFGCVPSVAGDVAGSAROGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
III | | | | | I t 11 I I I I I I I I I : I I : I I I M I M I I I I M I I I I I I I M I I II I 1 I I I I 
DFGRVPSVAGDVARSARQGGDGNVWYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
310 320 330 340 350 360 

360 370 380 390 400 410 

DDGDLGRVAFGLWLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 

Mill 1IIII1I:IIIIMIIIIIIM:IIIIIIIIH II I I I I :l ll:M 

DEK3DLGRVAFGLWLAQVGTGGG FDTQRHNW IG LRAGGS AVDDG FCADGG PADDCAEAA 
370 380 390 400 410 420 

420 430 440 450 

AKGKAENGGNQGADGVRFG FHRVL P FLGVSDG I ALRHAVX 
I : I I I I : I I I 1 I I I I I I 1 I I I I I I I I I 11 I I I I I I I I I 
AEGKAEDGGNQGADGVWFGFHRGLPFLGVSDGIALRHAVX 
430 440 450 460 



40 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 26 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 215>: 



1 ATGAAAACCT 

51 CGCCGCCTGC 

101 CCGCCGCCGA 

151 CGTCGGCGAC 

201 AGAAAAAAGG 

251 CCGAATCTGG 



TCTTCAAAAC 
GGATT.CAAA 
CAACGGCGCG 
TTCGGCGATA 
CTACACCGTC 
CATTGGCTGA 



CCTTTCCGCC 
AAGACAGCGC 
GCGTAAAAAA 
TGGTCAAAGA 
AAACTGGTCG 
GGGCGAGTTG 



GCCGCACTCG 
GCCCGCCGCA 
GAAATCGTCT 
ACAAATCCAA 
AGTTTACCGA 



CGCTCATCCT 
TCCGCTTCTG 
TCGGCACGAC 
GCCGAGCTGG 
CTATGTACGC 



50 This corresponds to the amino acid sequence <SEQ ID 2 1 6; ORF4>: 

1 MKTFFKTLSA AALALILAAC G . QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 217>: 



i 

51 
101 
151 
201 



ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 
CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 
CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 
GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 
GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 
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251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

5 4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

1Q 701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; ORF4-l>: 

15 ! MK TFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

20 251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted OR F from N. meningitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of At 
meningitidis: 

?<: 10 20 30 40 50 59 

orf4 oep MKTFFKTLS AAALALIIiAA CG-QKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

pp hi ii mi m ii ii i n ii n 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \ 1 1 

or£4a MKT ^FKTLS AAALALILAA CGGQKDSAPAASASAAADNGAAXKEIVFGTT VGDFGDMVKE 
10 20 30 40 50 60 



30 



60 70 80 90 . 

orf4 pep QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 
|| llll Mill I I II Mill I II HI II I 



orf4a XICPELEKKGYTVKLTOXTDYVRXNLALAEGELDINVXQHXXYLDD 
25 " 70 80 90 100 110 120 

nrf4a yPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length ORF4a nucleotide sequence <SEQ ID 219> is: 

AC\ 1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

*1 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

45 251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTKGAT 

50 S01 CAAACTCAAA GACNGCATCA NNNNGNNGKN NNNANCNANA NNNGANANNN 

551 NNNNANNNNT NNNNNNNNNK NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 

601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

55 7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 

l MKTFFKTLSA AALALILAAC GGQKDSAPAA SASAAADNGA AXKEIVFGTT 
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51 VGDFGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 

151 PXXFXRVLVM LDELGXIKLK DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 

201 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

5 251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

A leader peptide is underlined. 

Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ID 221>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

10 101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

15 351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 

551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

20 601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

25 851 GCGCAGCCAA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

30 151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 

10 20 30 40 50 60 

35 orf4a-l MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

I I I I I M > I I I I I I i I I I I I I I I I t II t ! I t I I I M I I I I I I I I I I I I I I I I t I I I t M I 
orf 4 - 1 MKTFFKTLSAAAIALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

10 20 30 40 50 60 

40 70 80 90 100 110 120 

orf4a-l QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
Ml | |i I II U I I I I I I I ! I 1 I I I I I I I I I t I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 4 - 1 Q 1 QAE LE KKG YT VKLVE FT D YVR PN LALAEGE LD IN V FQHK P Y LD D FKKEHNLD I TE V FQ 

70 80 90 100 110 120 

45 

130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
|| || III Mill II III IMIIM II III lllll I II I I Ml III II II III II Ml Ml 
orf 4-1 VPTAPLGLY PGKLKS LEEVKDGSTVS APNDP SN FARVLVMLDE LGW I KLKDG I N PLT ASK 

50 130 140 150 160 170 180 

190 200 210 220 230 240 

orf4a-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
I M II I II II I I M M I II M II I II II II I I M M II M M M M M II I I I M I M M 
55 orf 4-1 ADIAENUCNIKIVE1XAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 

190 200 210 220 230 240 

250 260 270 280 

orf4a-l AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAKX 

60 M I M I M I I M I 1 1 M I I 1 1 M I M I M II M I II II I II II I II II 

or f 4 - 1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAKX 

250 260 270 280 
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Hnmnln pv with an outer membrane p rrtejno LPaste^^ 

0RF4 and this outer membrane protein show 33% aa identity in 91 aa overlap: 



lip2. pasha 
0RF4 



10 20 
MN FKKLLGVALVS ALALTACKDEKAQAP 

M | : : I I I I I : I I : I : I 
VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL— ALILAACGFKKTARPPHPL 

HO 120 130 140 150 



10 



15 



3C 40 50 60 70 80 

lit>2 Pasha -ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYtQPNAALHSKD 

ORF4 LPPPTTARRKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNIALAEGE 
160 170 180 190 200 210 

90 100 110 120 130 140 

lip2 . pasha LDANAFQTVPYLEQEVKDRGYKLAIIGNTLVWPIAAYSKKIKNISELKDGATVAIPNNAS 
i 



ORF4 



20 Homology with a predicted ORF from N. gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 



25 



30 
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gonorrhoeae: 



orf 4nm.pep 
orf 4ng 



orf 4nm.pep 
orf 4ng 

orf 4nm.pep 
orf 4ng 



10 20 30 

MKTFFKTLSAAALALILAACGXQKDSAPAA 

I I I I I Ml I : I : I I I M t 1 I I MillMI 
RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA 
200 210 220 230 240 250 

40 50 60 70 80 89 

SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 

||:| : | | | | | | I M I I I ! I I I It I I I I I M I I I II I I I I I I I I I I I I I I I M I I I I I I I 
SAAAPSADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 

260 270 280 290 300 310 

90 
EGEL 

EGELDINVFQHKPYLDDFKKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 
320 330 340 350 360 370 
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The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 
protein having amino acid sequence <SEQ ID 224>: 

1 MKTFFKTLST ASLALILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

*1 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDG5TVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 



50 



55 



60 



1 atgAAAACCT 

51 CGCAGCCTGc 

101 CCCCTTCTGC 

151 Accgtgggcg 

201 gGAGAAAAAA 

251 gCCCGAATCT 

301 CACAAACCCT 

351 CGAAGCCTTC 

401 TGAAATCGCT 

451 gACccgTCCA 

501 GATCAAACTC 

551 TCGCGGAAAA 



TCTTCAAAAC 
ggCggtcaAA 
CGATAACGgc 
acttcggcgA 
GgctACACcg 
GGCATTGGCG 
ATCTTGACGA 
CAAGTGCCGA 
GGAAGAAGTC 
ACTTCGCACG 
AAAGACGGCA 
CCTGAAAAAC 



cctttccgcc 
AAGACAGCGC 
gCgGCGAAAA 
TAtggTCAAA 

tcAAattggt 
GAGGGCGAGT 
TTTCAAAAAA 
CCGCGCCTTT 
AAAGACGGCA 
CGCCTTGGTG 
TCAATCCGCT 
ATCAAAATCG 



gccgcaCTCG 
GCCCgcagcc 
AAGAAAtcgr 
GAACAAATCC 
cgaatttacc 
TGGACATCAA 
GAACACAACC 
GGGACTGTAT 
GCACCGTATC 
ATGCTGAACG 
GACCGCATCC 
TCGAGCTTGA 



CGCTCATCCT 
tctgcCGCCG 
CtTCGGCACG 
AagcCGAgct 
gactatgtGC 
CGTCTTCCAA 
TGGACATCAC 
CCGGGCAAAC 
CGCGCCCAac 
AACTGGGTTG 
AAAGCCGACA 
AGCCGCACAA 
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601 CTGCCGCGCA 

651 CGCCATAAGC 

701 GCTTTGCCTA 

751 CAATGGCTTA 

801 CTACGCGCAC 

851 AAGGCGCAGC 



GCCGCGCCGA 
AGCGGCATGA 
TGTCAACTGG 
AAGACGTAAC 
AAACGCTTCG 
CAAATAA 



CGTGGATTTT 
AGCTGACCGA 
TCTGCCgtcA 
CGAGGCCTAT 
AGGGCTACAA 



GCCGTCGTCA 
AGCCCTGTTC 
AAACCGCCGA 
AACTCCGACG 
ATACCCTGCC 



ACGGCAACTA 
CAAGAGCCGA 
CAAAGACAGC 
CGTTCAAAGC 
GCATGGAATG 



This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>: 



1 MKTFFKTLSA AALALILAAC 



10 



51 TVGDFGDMVK 

101 HKPYLDDFKK 

151 DPSNFARALV 

201 LPRSRADVDF 

251 QWLKDVTEAY 



EQIQAELEKK 
EHNLDITEAF 
MLNELGWIKL 
AWNGNYAIS 
NSDAFKAYAH 



GGQKDSAPAA 
GYTVKLVEFT 
QVPTAPLGLY 
KDGINPLTAS 
SGMKLTEALF 
KRFEGYKYPA 



SAAAPSADNG 
DYVRPNLALA 
PGKLKSLEEV 
KADIAENLKN 
QEPSFAYVNW 
AWNEGAAK* 



AAKKEIVFGT 
EGELDINVFQ 
KDGSTVSAPN 
IKIVELEAAQ 
SAVKTADKDS 



This shows 97.6% identity in 288 aa overlap with ORF4-1: 



15 



20 



25 



30 



10 20 30 40 50 59 

orf4-l.pep MKTFFKTLSAAAIALIIJUVCGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
I I I I I I I I ! I I I I I I I I I M t I I I I I I I I I II : I : I I I II II I I I I I I I I I I 1 I I I I I I 
orf 4ng- 1 MKT FFKTLSAAALALI LAACGGQKDSAPAAS AAAPS ADNGAAKKEIVFGTTVGDFGDMVK 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 4-1 . pep EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 
I I I I I I I I I I I I I I I 1 1 I I I I I I I I I f I 1 I I I I I 1 I II I I II I I I I I I I I I I i I I I I I : 1 
orf4ng-l EQIQAEIXKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 4-1 . pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 
I I | | | I I I I I II I I I I I I I I I I 11 I I I I I I I I I I I I I : I II I : 11 I I I I I I I I I I I I I I I 
orf4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 
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180 190 200 210 220 230 239 

orf 4-1 .pep KADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNW 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I t I I 1 I I I I I I I I I 
orf4ng-l KADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNW 

190 200 210 220 230 240 

240 250 260 270 280 

or f 4 - 1 . pep SAVKTADKDSQWLKDVTEAYN S DAFKAYAHKR FEGYKS PAAWNEG AAKX 

I I I I 1 I I I I t 1 I i I f 1 1 1 t I !! I 1 I I I I I I I I I I I I 1 fill I I I I I I I 
orf4ng-l SAVKTADKDSQWLKDVTEAYN S DA FKAYAHKRFEGYKY PAAWNEG AAKX 

250 260 270 280 



45 In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 

ID LIP2_PASHA STANDARD; PRT; 276 AA. 

AC Q08869; 

DT 01-NOV-1995 (REL. 32, CREATED) 
50 DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 
DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR . . . . 
SCORES Initl: 27 9 Initn: 416 Opt: 494 

Smith-Waterman score: 494; 36.0% identity in 275 aa overlap 

55 

10 20 30 40 50 

orf4ng-l.pep MKT F FKT L SAAAL—AL I LAACGGQKDSAPAAS AAAPS ADNGAAKKE I VFGTTVGDFGDM 
I I I : : I I 111:11 : I : I 1 I : : I : : : I I I I : : I : : I 

lip2 pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

60 " 10 20 30 40 50 

60 70 80 90 100 110 

orf4ng-l.pep VKEQIQAELEIQCGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
:: :: III I : I I : I I : I : : I I II :ll 1:11 Ml:: I::: :: 
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lip2 pasha TEVAVKlAKEKYGLDVELVQFTEYTQPNAAL»HSKDLDANAFQTVPYLEQEVKDRGyKLAI 
60 10 BO 90 100 HO 

120 130 140 150 160 170 

5 orf 4ng-l . pep afqvptaplglypgklksleevkdgstvsapndpsnfahalvmlnelgwiklkdginplt 

: : : I : : I I : I : : I : I I I : If : It: II 1 1 1 I : : | : I : I I I I I : 

lip2 pasha igntlvwpiaayskkikniselkdgatvaipnnasntarallllqahgllklkdpkn-vf 

120 130 140 150 160 170 

10 180 190 200 210 220 230 

orf 4ng-l . pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTE — ALFQEPSFA 

I : : I I I I II I I I I : : : : I I I I : : M : ! : : I I : : 1 : : : : : : 
lio2 pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 
■ " 180 190 200 210 220 230 

15 

240 250 260 270 280 289 

orf 4ng-l . pep YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

Ml : : : I | : I : ::::::: 111:! 
Iip2 pasha YVNLWSREDNKDDPRLQTFVKSFQTEEVFQEALKLFNGGWKGW 
20 " 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteurella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 
25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in Rcoli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8A and 
8B show, repsectively, the results of affinity purification of the His-fusion and GST-fusion 
30 proteins. Purified His-fusion protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1. 
35 Example 27 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

40 151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

45 401 AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

451 CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 

551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 
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651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC. . . . 

J". GC agACACGCCC GCCGCATCCG 

751 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 
801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 
fl^t nTTTrCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 



ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGC 
TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 



851 
901 



1 PRRP RHAPVSRGDL LQGGGTYARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

10 101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

15 Computer analysis of this amino acid sequence gave the following results: 
Seq uence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 
20 Homology with a predicted QRF from N.eonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORF8.ng) from N. 
gonorrhoeae: 

orfftna 1 MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGRFMAEPALFPR 50 
° 9 I II I till I llll t I I I I I ^ 1 I I t I 1 1 t I I I I I 1 1 I I I 1 I 

25 orf8.pep 1 PRRPRHAPVSRGDLLQGGGTYARHGKRAGRGFGR FMAEPALFPR 44 

orfSna 51 QPPLLPDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 100 

9 Mini I I 1 I I 1 1 1 I I I I I I M I I I I 1111=1111111111111 

or f 8. pep 45 QPPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 94 

orf8na 101 DARDERPHRRRHRHCRRQTAAAEIHTDVAFHACRQPGRLQQNDCRNQQRQ 150 

9 11 1 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 1 1 mini 11 1 Mi 11 1 11 

or f 8. pep 95 HARHER PHRRGHRHRRRQTAAAE I HT DVAFHACRQPGRMQQNDCRNQQRQ 14 4 

15 orfSna 151 AYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 200 

9 1 : 1 II 1:1: II II lllll Ml II I I 111 1 1 III III II I I III 

orf8.pep 14 5 AHDPRTPRGEHGENAPNQRTHGQKPQPSRRHIGRKLHQPRHDGSHAARPP 194 

orfSna 201 QNRQHHRAAPDHRRQAAISQTQRQRNPAARPPLHTAPNRPATNRRPHQRQ 250 

40 ^ " 1 11 1 I I I I I 1 I 1 I I 1 I I I I I I 1 I 1 I 1 1 I IIMtlM t 

orfS.pep 195 XNRQHHRAAP DHRRQAAI SQTQRQRN PAAXPPLHT APN Q 244 

orf8na 251 TRPPH PHRHRHQPRTG S PRRT P PL PMAG FP1AQHQ YASGN FRPRH P PAT H 300 

y MiiiimimiitiiMiMiiiii inn.niiiiiiiii in 

45 orfS.pep 245 TRPPHPHRHRHQPRTGS PRRT PPLPMAG LP LAQHRYASGN FRPRHPAATH 294 

orf8ng 301 PPQMAGCPRT PTPAPKPA* 319 

I II M I II 1 1 1 1 M M M I 
orf8.pep 295 PPQMAGCPRT PTPAPKPA* 313 

50 The complete length ORF8ng nucleotide sequence <SEQ ID 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 

1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

55 151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 



30 
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201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLKTAPNRP ATNRRPHQRQ 
251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQVASGN FRPRHPPATH 
301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
and ^.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 23 1>: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



. GAAATCAGCC 
GGATTCGGAA 
GGGCGTGGGT 
CGCGATTTGT 
TGTCCGCATC 
TGCAGGAACA 
GCTTT . GGCA 
CCGCTGGTTC 
TCGTCGTCAG 
GGACATTATC 
AGAATCGCTC 
GTTATCCTTT 



TGCGGTCCGA 
CGTTTTCTGC 
GGAAAACGGC 
CGCCTTTGGG 
GTCGGTTGCG 
GCTCGCCCGA 
TACGCAACCA 
AACGCCTTGG 
TTGCGGCACG 
TCGGAGA . GG 
GCCGTCCGAA 
CCCGACCGG. 



CNACAGGCCG 
TGTTGGACGG 
ACGTTCGCAA 
CGCGGAGTGG 
CTGTGTGCGG 
AAAATCGAGT 
CTACCGCCAC 
GCAGCCGCCG 
GCGGTAACGG 
AACCATCATG 
CCGCCAACCT 



GTTTCCGTGN 
CGGCAACAGC 
CCGTCGGTAG 
GCGGAAAAGG 
AGAATTCAAA 
GGCTGCCGTC 
CCCGAAGAAC 
CTTCAGCCGC 
TTGACGCGCT 
CCCGGTTTCC 
CAACCGGCAC 



CGAAGCGGCG 
CGGCTCAAGT 
CGCGCCGTAC 
CGGATGGAAA 
AAGGCACAAG 
TTCCGCACAG 
ACGGTTCCGA 
AACGCCTGCG 
CACCGATGAC 
ACCTGATGAA 
GCCGGTAAGC 



25 



This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 ..EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 

51 RDLS PLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 

101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVWSCGT AVTVDALTDD 

151 GHYLGXGTIM PGFKLMKESL AVRTANLNRH AGKRYPFPT. . 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 



30 



35 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTA 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTAGAA 
GGCGGGGCAA 
CTGGACGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTT 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 



TGAAGCTTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGGATG 
CAAATTGGGC 
TTGCCGTGGT 
AATGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 



GCACTGGCG3 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
TGCAGATTAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGCTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCTGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGCG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGTC 
GTGGCCCAAT 
TTGAAACGGT 
ATCAATTTTG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTTG 
CAGGACGGGC 
TCCTGCCCAA 
ACGGCATCGC 
GTTGGTGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGCGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
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1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
\lOl GCGCGTGGCG SSaCCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 
1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence <SEQ ED 234; ORF6M>: 

5 1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

in 251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEHCKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 C VWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

15 501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-1 . Further 
computer analysis of this amino acid sequence gave the following results: 
Hnmolo pv with the baf protein of B pertussis (accession number U12020). 
20 ORF61 and baf protein show 33% aa identity in 166aa overlap: 

orf61 23 LLLDGGNSRLKWAWVE-NGT FATVGSAPYR DLSPLGAEWAEKADGNVRIVGCAVCG 77 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

baf 3 IL I DSGN SRLKVGW FDP DAPQAARE PAPVAFDNLDLDALGRWLAT LPRR PQRALG VNVAG 62 

25 orf61 78 EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 131 

^ + + L I WL + A G+RN YR+P++ G+DRW L + 

baf 63 LARGEAIAATLRAGGCDIRWIJIAQPLAMGLRNGYRNPDQ^ 122 

orf61 132 ACWVSCGTAVTVDALT DDGHYLGXGT IMPGFHLMKESLAVRTANL 177 
30 +V S GTA T+D + D + G G I+PG +M+ +LA TA+L 

baf 123 PLLVASFGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of N. 
35 meningitidis: 



orf61.pep 



orf 61a 



10 20 30 

EISLRSDXRPVSVXKRRDSERFIXLDGGNS 

IMItll Mill I I I I I I I 1 I 1 I 1 1 I I I 
TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 

40 " 290 300 310 320 330 340 

40 50 60 70 80 90 

orf61 pep RIJ<WAW^NGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 
orf 61 .pep | ill | | | | | | | | | I I I I I M 1 1 1 II I 1 M I II I s I I I I I I I I I I I I M I I I I M I Ml ' • 
4 < or£61a rlkwaWVENGTFATVGSAPYRDLSPI^AEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLAR 

350 360 370 380 390 400 

100 110 120 130 140 150 

orfSl Pep kiewlpssaqaxgirnhyrhpeehgsdrwfnai^srrfsrnac\aa/scgtavtvdaltdd 
50 ortbi.pep km. w i 1 1 i n 1 1 j 1 1 1 1 1 1 i i i i 1 1 1 1 1 u 1 1 i 1 1 I 1 1 1 1 I I I II II 1 1 III 

410 420 430 440 450 460 

160 170 180 189 

55 orf61 pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 

mm! | | 1 1 I I I I I 1 1 1 1 II I I I I I M I I I M I I I 1 1 I 
orf 61a GHYLG-GTIMPGFHLMKESLA\flRTANLNRHAGKRYPFPCT 

470 480 490 500 510 520 

60 orf 61a HGRLKEKTGAGKPVDVIITGGGAAK^AEALPPAFIAENTVRVADNLVIHGLLNLIT^AEGG 
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530 540 550 560 570 580 

The complete length ORF61a nucleotide sequence <SEQ ID 235> is: 

1 ATGACGGTTT TGAAGCCTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

5 101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGTG TGACCCACCT 

10 351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGCC GGCGCGCCTT 

501 GTCGCGTTTG GGTTTGAAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

15 601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGAAA TGCCGATGCC GCCGTGTTGC TGGAAACGCT GTTGGCGGAA 

751 CTTGATGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

20 851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTC TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

25 HOI GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGTGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATT CAAAAAGGCA 

1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

130"» CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

30 1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

35 1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATTCACGG GCTGCTGAAC CTGATTGCCG 

1751 CCGAAGGCGG GGAATCGGAA CATACTTAA 

This encodes a protein having amino acid sequence <SEQ ID 236>: 

40 1 MTVLKPSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLROHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLKTQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

45 251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVIiHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KVDGNVRIVG CAVCGEFKKA 

4 01 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CWVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRiiAGK 

50 501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVI ITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HT* 

ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap: 

10 20 30 40 50 60 

orf 61a pep MTVI^PSHWRVIAEUVDGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 

55 M ii i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 J 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 f t 

orf 61-1 tflVlJCLSHWRVLAEIADGLPQHVSQLARMADMKPQ 

10 20 30 40 50 60 

70 80 90 100 110 120 

60 orf 6ia pep lvrpiavfdaeguu:i^rsgfotalkhecassndeii^i^iapdkahkticvthlqsk 

I I I I I I 1 1 I I 11 I I I I I I I I 1 I I I I I I I I I I I 1 1 I I II M 11 I I I II II I I I I I I I I I I I 
or f 6 1 - 1 lvrpijvvfdaeglrei^ersgfqtalkkecassndeii^ij^iapdkahkticvthi^s^ 

70 80 90 100 110 120 

65 130 140 150 160 170 180 
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orf 61a .pep 
orf61-l 

orf 61a. pep 
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orf61-l 

orf 61a. pep 
orf61-l 

orf 61a. pep 
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GRGRQGRKWSHRLGECLMFSFGWVFDRPQYEI^SI*SPVAAVACRRALSRLGLKTQIKWPN 

I 1 1 1 1 1 1 | | | 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) I 1 1 1 1 I I 1 1 1 1 J[ NH 1 1 1 1 I I : M Mil 
GRGRQGRKWSKRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 

140 150 I 60 "0 180 



130 



240 



190 200 210 220 230 

DLWGRDKLGGILIETVRTGGKTVAWGIGINEVLPKEVENAASVQSLFQTASRRGNADA 

I I I 1 i I 1 1 | I 1 I I I t I I I f I I 1 I 1 1 I I I 1 I I I I I I * I I 1 I I I I I I I I I I t 1 I 1 II 

DLWGRDKLGGILIETVRTGGKTVAWGIGINEVLPKEVENAASVQSLFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

AVLLETLLAELDAVLLQYARDGFAPFVAEYQAAKRDHGKAVLLLRDGETVFEGTVKGVDG 
I | | | | | | | : | | | I I I I t I I I I i I I t I I t I I I I i I I I I I I I I I i I I I I I I I I I M M I M I 
AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

I i 1 | t | | | j I I I t I I t I I I I I I 1 I I I 1 I I I I 1 I 1 I 1 1 I I I t I ! I I I S f t I I 1 1 I t I I I 1 I 
QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
310 320 330 340 350 360 

370 380 390 400 410 420 

ATVGSAPYT^DLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

1 1 1 1 1 1 1 1 j 1 1 1 1 1 1 1 - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 nun 

ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
370 380 390 400 410 420 

430 440 450 460 470 480 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 

1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 H it 1 1 H 1 1 1 1 i i h i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i 

GIRN HYRH PEEHGS DRW FNALG SRRFSRN ACWVSCGT AVTVDALTDDGH YLGGT I MPG F 
430 440 450 460 470 480 

490 500 510 520 530 540 

HU4KESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 

| 1 1 I t I t t I I I I t I I I I I 1 I t 1 I I I 1 I I I t I I I I I 1 I f I 1 I I t 1 I I I I 1 I > ■ 1 t I 

HU4KESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

I I | M I H I I I It II I I I I I I M I M I I t I I I I M : I I I I : m I I I II 
VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 
550 560 570 580 590 



Homology with a predicted ORF from N.zonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF61.ng) from N. 
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EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 
Mill I I III II 11111111:1111 

TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 

RUWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 90 

1 1 f | | | 1 1 1 I 1 I t I I I I 1 t I t I I I t 1 I } I I 1 I I t I I I I Mill 11111:11111 

RLKWAWVENGT FATVGS APYRDLS PLGAEWAEKADGNVRI VGCAVCGE S KKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 
1 1 I 1 1 1 1 1 1 1 I IIIIIIIIMIIIIIIIIIIIIIMIIUIIIIIIIMMIIIMIII 

KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDD 331 

GHYLGXGT IM PG FH121KE S LAVRTAN LNRHAGKR Y P FPT 189 
Mill II II II I II III M III II MM lllll I I II 

GHYIiG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 3 90 
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An ORF61ng nucleotide sequence <SEQ ID 237> was predicted to encode a protein having amino 
acid sequence <SEQ ID 238>: 

1 MFSFGWAFDR PQYE LGSLSP VAALACRRAL GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAW GIGINFVLPK EVENAASVQS LFQTASRRGN 

101 ADAAVLLETL LAELGAVLEQ YAEEGFAPFL NEYETANRDR GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTWS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGES KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSRRFS RNACWVSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

351 AVRTANLNRP AGKRYPFPTT TGNAVASGMM DAVCGSIMMM HGRLKEKNGA 

401 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 

Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 239> to be: 

1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

751 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

801 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGccrGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

1051 AAGTGGGCGT GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 gtaCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

1201 CAAGTGAAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

17 51 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 



1 MTVLKPSHWR VLAELADGLP CHVSQLAREA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRDLGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGEC1MFS FGWAFDRPQY 

151 ELGSLSPVAA LACRRALGCL GLETQIKWPN DLWGRDKLG GILIETVRAG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LGAVLEQYAE EGFAPFLNEY ETANRDHGKA VLLLRDGETV CEGTVKGVDG 

301 RGVLHLETAE GEQTWSGEI SLRPDNRSVS VPKRPDSERF LLLEGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGESKKA 

401 QVKEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CWVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRPAGK 

501 RYPFPTTTGN AVASGMMDAV CGSIMMMHGR LKEKNGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HA* 



ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 
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MTVIJ<PSHWRVlJ\ElADGLPQHVSQIJtf^ 
^^LSHWRVI^IAIXaPQHVSQI^^ 60 



LVRPIJVVFDAEGIJtDLGERSGFQTAliCHECASSNDEILEI^IAPDKAHKTICVTHLQSK 120 

.t 1 1 1 I I 1 | I 1 I i I 1 1 M 1 I M I t I I I I I I M I I I I f 1 i t I I I I I I I 1 I 1 I I 1 1 I I I ( 
LVRP^VFDAEGLRELGERSGFQTAIJ^^ 120 



orf6lng-l.pep GRGRQGRKWSHRLGECIilFSrGWAFDRPQYELGSLSPVAAIACRRALGCIiGLETQI^PN 180 

f|I|||||l«IMIIIIIIIIII:||IIIIIMMIMII:|||||l: I H ^j"' ■ 
orf€l-l GRGRQGRKWSHRLGECLMFS FGWVFDR PQYE LG S LS PVAAVACRRALS RLGLDVQI KW PN 



180 



orf 61ng-l .pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 

orf61ng-l.pep 

orf61-l 

orf61ng-l.pep 



orf61-l 



orf 61ng-l.pep 
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240 



DLWGRDKLGGILIETVRAGGKTVAVVGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

| I I I 1 1 1 1 1 II I II I II I : I 1 1 1 1 1 1 1 1 1 1 1 1 M II I I I I I 1 1 M I I I I I I I I 1 I Ml M 
DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 

Mil IM 1:1 I Ml lll::lllll: 1 1 : : M M M I II 1 1 M I II M MINI II I 
AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 



RGVLHXXTAEGEQTVVSGEISLRPDNRSVSVPKRPDSERFLLLEGGNSRLKWAWVENGTF 

: i 1 I 1 1 K t I I 1 1 I I I I I I 1 1 I t I 1:1 HUM M II M M : M II M II I II M I M 
QGVI^LETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPSSAQAL 

1 | | | | 1 t t I I I t 1 I I t 1 I 1 1 I 1 ! I I 1 I I t t I II I II : I I M M I I M I I I I M I 

ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
I I I I 1 1 I I I I I M I I M I M M M M II 1 1 1 1 I M M M II I II I M I M M M I II I II 
GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 
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HLMKES1AVRTANLNRPAGKRYPFPTTTGNAV7VSGMMDAVCGSIMMMHGRLKEKNGAGKP 54 0 

I I t I M 11 I II I I I I I II I II I 1 1 II 1 1 1 II I II I II I M M : I M I I II II I : II I I I CAn 
HU4KES1ATOTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 540 

orf61na-l pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 

I 1 I I II M I M M M M M I I I M I H I M I 1 11 I : M I I : I I M I III 
orf 61-1 V DV 1 1 TGGGAAKVAEALPPAFLAENTVRVADNLV I YGLLNM I AAEGRE YEHI X 593 

Based on this analysis, including the homology with the baf protein of B.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
proteins from ^meningitidis and ^gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 
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The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 241>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGaAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGaAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGC . . 



60 This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 
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1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 
51 GKI PREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 
101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 
151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 
5 201 WSVGMVLSLL YLGLGC. . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

1Q 151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGAT7G TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 ^TTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

15 4 oi CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGCGG 

20 651 CTGG7ACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ATGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

851 AATAA 

25 This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 

1 MFYQILAL II WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKI PREEWKP LLIVSFVNYV LTLLLQFV GL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 XGA GFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

30 201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSGLLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFW IAATLVAG RL SHQK* 

Computer analysis of this amino acid sequence gave the following results: 

Hnmr^q y with hypothetical transmembrane protein H IQ976 of H. influenzae (accession number Q57 1 47) 

ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 

35 0rf€2 1 mfyqilALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +D? L+V VR R KI + K 

HI0976 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLVVQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Orf62 61 LLIVSFVNYVLTLLLQF\'GLKYTSAASASVIVGLEPLUMVFVGHFFFNDKARAY 114 
AO L ++F NY LLQF+GLKYTS A+ SA ++GLEPLL+VFVGHFFF K + 

HI0976 61 LVW1AFFNYTAVFLLQFIGLKTTSASSAVTMIGLEPLLVVFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from N. m eningitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of N. 
45 meningitidis: 

10 20 30 40 50 60 

orf62 Deo McvrtTT.AT TTW^^^FTAAKYVYGGID PAI^IVGVRLLIAALPAL PACRRHVGKI PREEWKP 

orf62 ' p p imiiniiiniix n 1 1 1 1 1 1 ,mmi in Milium 1 1 1 1 1 1 1 1 1 1 1 1 1 n i 

o - f 62 a MFYQI LALIIWSSSFIA AKYVYGGI DPAI^VGVRLLIAALPAL PACRRHVGKI PREEWKP 
50 ~ 10 20 30 40 50 60 

70 80 90 100 HO 120 

orf62 pep IXTVSF VNWLTLLLQFV GIJCYTS AASASVIVGLEPLI^VFV GHFFFNDKARAYHWICGA 
* P P HI 1 IIMI I I I Ml M I I | I I | 111 | I I I I II I I I 1 I I I I IN I I M I II III I II II I 
55 orf62a LLIVS FVNYVLTLLLQFV GUCrrS AA^ASVIVGI^PLLMVFV GHFFFNDKAJ^YHWITJ^ 
JJ ------ _ qo go 10Q 110 12Q 

130 140 150 160 170 180 

orf 62 . pep AAFAGVALI>1AGGA£EGGEVGWFGCLLVLIAGA 
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| 1 I I I I 1 1 I 1 I 1 I I I I I I I I S I I I I 1 I I I I i t I I I | | | 1 | t IIIMIIIIIIMI 

130 140 1*0 160 I 1 * 180 

190 200 210 
orf62 pep AASLMCLPFSIAIA QSYTVDWSVGMVLSLLYLGLGC 
lilt! I I! MM Ml III 111 IMMMIMM:!! 
orf62a AASI^C LPFSIAIA QSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 
190 200 210 220 230 240 

orf 62a SLEPWGVLIAVLI LGEHLSPVSVLGVFWIAATLVAGRLSHQKX 
" 250 260 270 280 

The complete length ORF62a nucleotide sequence <SEQ ID 245> is: 
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20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCTGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GAATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TACTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCACT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGCG 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCAG 
GTTCCTGCCA 
CGTGCTGCTG 
TCTTGGGCGT 
TCGCATCAAA 



This encodes a protein having amino acid sequence <SEQ ID 246>: 

1 MFYQILALII WSSSFIAAKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 



35 



51 GKIPREEWKP 

101 FVGHFFFNDK 

151 AGAGFCAAMR 

201 WSVGMVLSLL 

251 AVLILGEHLS 



L LIVSFVNYV LTLLLQFVG L KYTSA ASASV IVGLEPLLMV 
ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 
PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 
YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 
P VSVLGVFW IAATLVAG RL SHQK* 



ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 



40 



45 



50 



55 



60 



orf 62a. pep 
orf62-l 
orf 62a. pep 
orf62-l 
orf 62a. pep 
orf62-l 
orf 62a. pep 
orf62-l 
orf 62a. pep 
orf 62-1 



MFYQI LALI I WS SS FIAAKYVYGGI DPALMVGVRLL I AALPALPACRRHVGKI PREEWKP 60 
H | II | || | I I M M II I I M I I I I II II I I I I M M M i If I I I I I M II M I fl I I I I 
MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 
I | M M M | I I I I M M I It II I II I I It I I I ! I II I M I I I I I I I I M I I I M I I I I 1 I 
1J,IVSFVNYVLTLLLQFVGIJCYTSAASASVIVGLEPLI^ 120 

AAFAGVALLMAGGAEEGGEVGW FGCLLVLLAGAG FCAAMRPTQRLI ARI GAPAFTSVS I A 180 
I I I I I I M I I I I 1 I I M I M I M It I M I M I I II M I I I M ! 1 I II I I I M I I I I I I M 
AAFAGVALLMAGGAEEGGEVGW FGCLLVLLAGAG FCAAMRPTQRLIARI GAPAFTSVS I A 180 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 240 

I I I I I 1 J 1 I t I I I I I I I 1 I 1 1 1 I 1 1 I I I ! I i I I x 1 I = I I 1 I i 1 I I 1 1 I HUM 

AASLMCLPFS LALAQSYTVDWSVGWVLSLLYIX3LGCGWYAYWLWNKGMSRVPANVSGLLI 240 

S LEPWGVLLAVLI LGEHLS PVS VLGV FWI AAT LVAGRLSHQKX 285 
III Mill MM IIMM M MMMMMIM MMMI Ml II 
SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 285 



Homology with a predicted ORF from N. gonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 
gonorrhoeae: 



BNSDOCIO. <WO 992457BA2J_> 



10 



15 



WO 99/24578 

orf62.pep 
orf62ng 
orf 62 .pep 
orf 62ng 
orf 62. pep 
orf 62ng 
orf 62 .pep 
orf 62ng 



PCT/IB98/01665 



-179- 



MFYQIIJailWSSSFIAAmYGGIDPAIJ^GVRLLIAALPALPACRRHVGKIPREE^P 60 
I! I I I I H tl I : I t I M I I M I M I I I I M i 1 H It I I I I N I M U M | I I II I I I M I 
MFYQILALI IWGS S FIAAmYGG I DPALMVGVRLLIAALPALPACRRHVGKI PREEWK? 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

1 1 M t 1 1 1 1 1 1 1 1 M 1 1 1 II H 1 1 1 1 1 I 1 1 1 1 1 1 M M 1 1 1 1 II 1 1 I M I 1 1 1 M 1 1 > > I 
LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGIXPLLMVFVGHFFFNDKABAYHWICGA 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 
I I I I I I I | I I I I M I M 1 1 1 I M I I II 11 M I I I M I M M I II II M I II I M II II M 
AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 

I M I I i I II I I I I I I I I I I I I I I I I I I I I I ^ _ 

AASI^CLPFSlALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGi-LI 



60 



120 



120 



180 



180 



216 
240 



The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 



20 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
CCGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGCGTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
ACGCGCAAAA 



AAATCCTTGC 

GTCTATGGCG 

TGCCGCGCTG 

CGCGTGAGGA 

CTGACCCTGC 

CGCATCGGTC 

ACTTTTTCTT 

GCGGCATTTG 

CGGCGAAGTC 
GCTTTTGTGC 

GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 
CGGCAATGCC 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGTTG 
GGAACAAGGG 
TCGCTCGAAC 
ACATTTATCG 
CTTTCGCCGC 
GTCTGA 



TGGGGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGTTG 
CCTTGGGCGT 
TCGCGCAGGG 



35 This encodes a protein having amino acid sequence <SEQ ID 248>: 



40 



51 
101 
151 
201 
251 



MFYQILALI I WGSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 
GKIPREEWKP LLIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 
FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 
AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 
WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANASG LLI SLEPWGVLL 
AVLILGEHLS P VSALGVFW IAATFAAG RL SRRDAQNGNA V* 



ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf62na pep MFYQILALI IWGS SFIAAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWK P 
| || | || |M ||:|l ill II II Ml II Ml III Mill II I III I I 1 1 I I I 1 i 1 HI MM 
orf 62-1 MFYQIIJU-I I WSSSFIAAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf62ra pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 
* y * H IIIMIIIIMIMMMMMMMHIMMIIMIHIIIIIIMIMMMMMI 
orf 62-1 IXIVSFVNYVLTLLLQFVGLKYTSAASA^VIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf62nq pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

| | | | | | I I M M M M I I II I II I M M II II I I I M I II I I II I I I M M I I M 

orf 62-1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf62na Deo AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 

**' P P mill Illllll I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I: I I II I 

orf 62-1 AA5LMCLPFSIAIJVQSYTVDWSVGMVLSLLYLGIXSCGWYAYWLWNKGMSRV?A>A^SGLLI 

190 200 210 220 230 240 



BNSOOCID: <WO 932457aA2 I > 



WO 99/24578 PCT/IB98/01665 

-180- 

250 260 270 280 290 

or f 62nq . pep SLE PWGVLLAVLI LGEHLS PVSALGVFWIAAT FAAGRLSRRDAQNGNAVX 
I I 1 1 1 I 1 I I 1 | I 1 1 1 1 1 1 1 I I i 1 I I 1 1 1 1 1 I I I I := I I I I I ^ : 
5 orf62-l SLE PWGVLLAVLI LGEHLS PVSALGVFWIAAT LVAGRLSHQKX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical HAnfluenzae protein: 

sp|Q57147|Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi | 1074589 Ipir I IB64163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
10 >gi 1 1574004 (U32778) hypothetical [Haemophilus influenzae] Length « 128 

Score - 106 bits (262), Expect « 2e-22 

Identities - 56/114 (49%), Positives « 68/114 (59%) 

Query: 1 MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 
15 * J M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LLIVSFVNYVLTLLLQEVGLKYTSAASASVIVGLEPLLMVFVGHFFH^DKARAY 114 
L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
20 Sbjct: 61 LWWIAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Based on this analysis, including the homology with the transmembrane protein of H.influenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from N.meningitidis and N.gonorrhoeae y and their epitopes, could 
25 be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 30 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

30 101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGgtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 

251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

35 351 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

401 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 

451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 

40 601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAGCATA GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 

751 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 

45 851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

50 1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC. . 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 

1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

55 101 GTINSWFGND T HEALERS LN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG ROYAL FFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 



BNSOOaO; <W0__99e4578A2J_> 



WO 99/24578 



PCT/IB98/01665 



-181- 



301 EPVXSLAEGA KAVAQGDFSQ TO*™?* GRLTXLFNHM ^ SIAKDA 
351 DERNRRREEA ARHYLECVLE GLTTGWVFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 251>: 



10 



15 



20 



25 



30 



35 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



ATGCGCCGTT 
CGGACTGACG 
GGTGGATTGT 
TTGGCACGTT 
CGGTTCGCAG 
TACTGCCCGG 
ACGATTAATT 
CCTCAATTTG 
GCAACGCCGT 
GGGGATATGG 
GCTTGCCCTG 
CGCACAAGCT 
CAACGGGCGG 
CGCGCAGGGC 
TGTTTTTCCG 
ATCGAAAAGG 
TTTGCAGACC 
TTTTTCTTGC 
CCCGTCCTAT 
CAGCCAGACG 
AGTTGTTCAA 
GAGCGCAACC 
GTTGGAGGGG 
TGAAAACCTT 
CCCCTGTGGG 
GTCCCTGCTT 
ACAAACCGGT 
CTGGGCAAGG 
GGTGATTGAC 
GGGGCGAAGT 
CCCATCCAGC 
GGATGAGCAG 
AACAGGTGGC 
CGTTCCCCTT 
CGATGTGTTG 
TTGCCGGCGA 
GTGCTGCACA 
TGTGCCCGAA 
TCCTGACGGT 
AACGCCTTCG 
TCTGCCTGTG 
TGAGCAATCA 
ACGGTAAAAA 



TTCTACCGAT 
GCGGCAACCG 
TGCGTTCAGC 
ATGTCATATT 
ATTGCCAAAC 
CGTGTTTCTG 
CGTGGTTCGG 
AGCAAGTCCG 
CCCCGTGCAG 
GCAGGGTGCT 
TACAATGCCG 
CGATCAGCCG 
GTTCGGTCAG 
TGGCTGTCGG 
TCAGCCGGTT 
CAAGGGCGAA 
TTTTTCCTGG 
ACTGGTCATG 
CGCTTGCCGA 
CGCCCCGTGT 
CCACATGACC 
GCCGGCGCGA 
CTGACCACGG 
CAACAAAGCG 
GCAGCAGCCG 
GCCGAAGTGT 
CCATGTGAAA 
CAACCGTCCT 
GACATCACCG 
GGCGAAGCGG 
TTTCCGCCGA 
GATGCGCAAA 
GGCATTGAAG 
CGCTCAAATT 
GCATTGTATG 
ACCGCTGACG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGT 
CTTATGCGTA 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
GCCTTTCTGG 
TTCGGCGTTT 
CAACGATACC 
CATTGAATTT 
ATAGACCTCA 
GGAACATTAC 
CAAGCGGCAA 
TTTCCAGGTA 
GGATTTGGAA 
CGGGTACGCA 
CCCAAAGGCG 
ATATGCTGAG 
CAACCCTGCT 
GCACTGTATT 
GGGGGCGAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GCGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TTGCCGCCAT 
TATGCCGCGC 
GCCCGAAGAC 
TTTTGATACA 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACGCG 
GAAATGGTCG 
GGAAAATCAG 
AAGCCGGTCC 
GTGGCGGCGG 
AAATGCCGCC 
AATCGGAAAC 
GGCAAAGGGT 
AACGGACAAA 
TCATTGAAGA 
GGCGCGTGTG 
G 



TGCGCCGTCG 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCG 
GATGTTTACG 
CCGCACAGTT 
CACGAGGCGC 
GGCGGCAGAC 
TCGGCGCGGC 
GCCGGCAGCG 
AATCGAAAAA 
AGGCGCGTTG 
AGCATAGGCG 
CAACGGGCGC 
TGGCAGAGGA 
TTGAGTTACA 
GATTGCCTCG 
TCGCCCGCCG 
GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
AGGCATTATC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGCGCGGCG 
CGGACGATGC 
AACGGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TTCGACCGAC 
AAGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
AGGGCAGGAC 
TCGGCAGGGA 
CCGGCGGGAA 
ACACGGCGGC 
TCAGAATCAT 



TCCTGTTGTA 
GA7TATTTCT 
GTCCGCCGTT 
ACGGCGTATT 
CTGGTTGCCG 
CATCAACGGC 
TTGAACGCAG 
AACGCCCTCG 
TTCCCTGCCC 
GTTTTGCCCA 
AGCATCAACC 
GGAAAAAATC 
GCGTATTGTA 
GATTACGCCT 
TGCCGTCTTA 
GCAAAAAAGG 
CTGCTGTCGA 
TTTCGTCGAA 
AAGGCGATTT 
CGCTTGACCA 
AGAAGCAGAC 
TTGAATGCGT 
CAAGGCTGTC 
GCCGCTTACC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTAAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCGTCA 
CAATTATGCG 
CCTTAATCGG 
GCGGCGGAGC 
CATGCGGCAG 
AAGAAGCCGA 
GGTCGGATTG 
AATGCTGCAC 
CGGGATTGGG 
CGCATCAGCC 
CTTGCCAAAA 



This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



50 



55 



60 



51 LARYVILLLK 

101 TINSWFGKDT 

151 GDMGRVLEHY 

201 QRAGSVRDLE 

251 IEKARAKYAE 

301 PVLSLAEGAK 

351 ERNRRREEAA 

401 PLWGSSRHGW 

451 LGKATVLPED 

501 PIQLSAERLA 

551 RSPSLKLENQ 

601 VLHNIFKNAA 

651 NAFEPYVTDK 

701 TVKTYA* 



DRRDGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDEQ 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLGLPV 



IAKRLSGMFT 



DYFWWIVAFS 
LVAVLPGVFL 



SKSALNLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLATLLIAS 



NALGNAVPVQ 
SINPHKLDQP 
DYALFFRQPV 
LLSIFLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKIIEEHGG 



RLTKLFNHMT 
QGCLKTFNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIVKQVAALK 
AAELAGEPLT 
GRIVLTVCDN 
RISLSNQDAG 



AM LLLVLSAV 

FGVSAQFING 

IDLIGAASLP 

FPGKARWEKI 

PKGVAEDAVL 

ALYFARRFVE 

EQLS IAKEAD 

AEOILGMPLT 

YAAPDDAKIL 

LAHEIRNPLT 

EMVEAFRNYA 

VAADTTAMRQ 

GKGFGREMLH 

GACVRIILPK 



Computer analysis of this amino acid sequence gave the following results: 



BMSDOOD <WO_ 99B4S78A2 J_> 



WO 99/24578 



-182- 



PCI7IB98/01665 



Homology with a predicted ORF fro ™ M meningitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 



orf 64 .pep 



^RFLPIAAICAXXI^GLTAATGSTSSIJU )YFWWIVAFSAM LLLVLSAVIARYVILLL K 
I | | | | | | | | | | | j I t I I I I I I t M I I I I ! I | | | | || I I I I I I M I I I I I I II II I I 
or f64a MRRFLPIAAICAWLLYGLTAATGSTSSLA DYFWWIVAFSAM LLLVLSAVLARYVILLL K 
10 20 30 40 50 60 



10 70 80 90 100 110 120 

or f64.pep DRRDGVFGSXXAKXPX XXMFTLVAXLPGVFXFG FPAQFINGTINSWFGNDTHEALERSLN 
iimnii II Mllll llltllll I I t I I I 1 1 f I I I I 1 I I 1 I I I I I I 1 i 

0rf64a nRRDGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 
70 80 90 100 110 

15 130 140 150 160 170 180 

orf64 pep LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 
| | | | | | I M M | I I 1 I I : I I I I I I I I I I U I I M I 1 I 1 II I I I I I I I I I I I I I I I I I 
orf 64a LSKSALNLAADNALGNAI PVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 

20 120 130 140 150 160 170 

190 200 210 220 230 240 

orf 64 .pep KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 
MIIMMIMIMI Ml IIIMKIMI 1 1 I I I I I I 1 Mill II t 1 1 I I I 1 I t t I 
25 orf 64a KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 

180 190 200 210 220 230 

250 260 270 280 290 300 

orf 64 . pep v pkctr F. n a VI ,T EKARAKY AEL S Y S KKG LOTFFLAT LL I AS LLS I FLALVMAL Y FARR FV 
30 i ^ I I | I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I I I I I I 1 i M I It I I I I I I 1 It I I I 

orf 64 a VPyf^AEDAVLIEKAJUOCXXXLSYSKKGLQTFFLAT LLIASLLSIFIiALVMALY FARRFV 
240 250 260 270 280 290 

310 320 330 340 350 360 

35 orf 64 pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 

i I I t 1 1 I I 1 I I I i I I I I 1 I 1 I I 1 I I I I I I I I I 1 I IN II III IIIIMIIIIH 111 I I 
orf 64a EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 
300 310 320 330 340 350 

40 370 380 390 

orf 64 . pep ARHYLECVLEGLTTGWVFDEQGCLKTFNKAAGT 

1 1 | I I I II I I I I I I I I M I I I I I I I I I I I I I I 
orf 64a ARHYIiECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSL 
360 370 380 390 400 410 

45 

orf 64a iaevfaaigaaagtdkpvhvkyaapddakillgkatvlpednxngvvmvidditvlihaq 

420 430 440 450 460 470 

The complete length ORF64a nucleotide sequence <SEQ ID 253> is: 

1 atgcgccgtt ttctaccgat cgcagccata tgcgccgtcg tcctgttgta 

50 51 cggactgacg gcggcaaccg gcagcaccag ttcgctggcg gattatttct 

101 ggtggattgt tgcgttcagc gcaatgctgc tgctggtgtt gtccgccgtt 

151 ttggcacgtt atgtcatatt gctgttgaaa gacaggcgcg acggcgtatt 

201 cggttcgcag attgccaaac gcctttccgg gatgtttacg ctggttgccg 

251 tactgcccgg cgtgtttctg ttcggcgttt ccgcacagtt tatcaacggc 

55 301 acgattaatt cgtggttcgg caacgatacc cacgaggcgc ttgaacgcag 

351 cctcaatttg agcaagtccg cattgaatct ggcggcagac aacgcccttg 

401 gcaacgccat ccccgtgcag atagacntca tcggcgcggc ttccctgccc 

451 ngggatatgg gcagggtgct ggaacattac gccggcagcg gttttgccca 

501 gcttgccctg tacaatgccg caagcggcaa aatcgaaaaa agcatcaacc 

60 551 cgcacaagct cgatcagccg tttccaggta aggcgcgttg ggaaaaaatc 

601 caacaggcgg gttcggtcag ggatnnggaa agcataggcg gcgtattgta 

651 cgcgcanggc tggctgtcgg cagnnacgca caacgggcgc gattacgcct 

701 tgtttttccg tcagccggtt cccaaaggcg tggcagagga tgccgtctta 

751 atcgaaaagg caagggcgna anannntnag ttgagttaca gcaaaaaagg 

65 801 tttgcagacc tttttcctng caaccctgct gattgcctcn ctgctgtcga 

851 tttttcttgc actggtcatg gcactgtatt tcgcccgccg tttcgtcgaa 
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15 



20 



25 



901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



CCCGTCCTAT 
CAGCCAGACG 
AGTTGTTCAA 
GAGCGCAACC 
GTTGGAGGGG 
TGAAAACCTT 
CCCCTGTGGG 
GTCCCTGCTT 
ACAMCCGGT 
CTGGGCAAGG 
GGTGATTGAC 
GGGGCGAAGT 
CCCATCCAGC 
GGACGAGCAN 
AACAAGTGGC 
CGTTCCCCTT 
CGATGTGTTG 
TTGCCGGCGA 
GTGCTGCACA 
TGTGCCCGAA 
TCCTGACAGT 
AATGCCTTCG 
ACTGCCCGTG 
TGAGCAATCA 
ACGGTAGAAA 



CGCTTGCCGA 
CGCCCCGTGT 
CCACATGACC 
GCCGGCGCGA 
CTGACCACGG 
CAACAAAGCG 
GCAGCAGCCG 
GCCGAAGTGT 
CCATGTGAAA 
CAACCGTCCT 
GACATCACCG 
GGCAAAACGG 
TTTCTGCCGA 
GACGCGCAAA 
GGCATTAAAA 
CGNCTCAATT 
GCATTGTACG 
ACCGCTGATG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGC 
CTTATGCGTA 



GGGGGCGAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GCGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TTGCCGCCAT 
TATGCCGCGC 
GCCCGAAGAC 
TTTTGATACA 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACACG 
GAAATGGTCG 
GGAAAA7CAG 
AAGCTGGTCC 
ATGGCGGCGG 
AAATGCCGCC 
AATCGGAAGC 
GGCAAGGGGT 
AACGGACAAA 
TCATTGAAGA 
GGCGCGTNTG 
G 



GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
AGACATTATC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGCGCGGCG 
CGGACGATGC 
AACNGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TTCGACCGAC 
AGGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAGGGA 
CCGGCTGGAA 
ACACGGCGGC 
TCAGAATCAT 



AAGGCGATTT 
CGCTTGACCA 
AGAAGCAGAC 
TCGAATGCGT 
CAAGGCTGTC 
GCCGCTTACC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTAAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCATCA 
CAATTACNCG 
CCTTAATCGG 
GCGGCGGAAC 
CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGATTGNG 
CNCATCAGCC 
CTTGCCAAAA 



This encodes a protein having amino acid sequence <SEQ ID 254>: 
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35 



40 



1 MRRFLPIAAI 

51 LARYVILLL K 

101 TINSWFGNDT 

151 XDMGRVLEHY 

201 QQAGSVRDXE 

251 IEKARAXXXX 

301 PVLSLAEGAK 

351 ERNRRREEAA 

4 01 PLWGSSRHGW 

451 LGKATVLPED 

5C1 PIQLSAERLA 

551 RSPSXQLENQ 

601 VLHNIFKNAA 

651 NAFEPYVTDK 

701 TVETYA* 



CAWLLYGLT AATGSTSSLA 



DRRDGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAXG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NXNGWMVID 
WKLGGKLDEX 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLXLPV 



IAKRLSGMFT 



DYFWWIVAFS 
LVAVLPGVFL 



SKSALNLAAD 
YNAASGKIEK 
WLSAXTHNGR 
FFLATLLIAS 



NALGNAIPVQ 
SINPHKLDQP 
DYALFFRQPV 
LLSIFLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSEAGQD 
VKKIISEHGG 



RLTKLFNHMT 
QGCLKTFNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIIKQVAALK 
AAELAGEPLM 
GRIVLTVCDN 
XISLSNQDAG 



AM LLLVLSAV 
FGV SAQFING 
IDXIGAASLP 
FPGKARWEKI 
PKGVAEDAVL 
ALY FARRFVE 
EQLSIAKEAD 
AEQILGMPLT 
YAAPDDAKIL 
LAKEIRNPLT 
EMVEAFRNYX 
MAADTTAMRQ 
GKGFGREMLH 
GAXVRI1LPK 



ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 
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orf64a.pep 
orf64-l 



orf64a.pep 
orf64-l 



orf64a.pep 
orf64-l 



orf 64a. pep 
orf64-l 



10 20 30 40 50 60 

l^RFL^IAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

^RFLPIAAICAWLLY^ 

10 20 30 40 50 



60 



70 80 90 100 110 120 

DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

| | | | | | 1 1 I I I 1 I M I ! I I 1 1 1 1 1 1 H 1 1 I I 1 1 I F 

iVPT.QfiMPTT.VAVLPGVFLFGVSAuj* x n\* 1 1 i^w c * m***-^*-*^ 

120 



1 I I I I I I I I t 1 I I I I I I 1 t I I I I I I 

drA^vfgsqiak^sgmH'lvavlp 

70 80 90 100 



110 



170 



180 



130 140 150 160 

sksalniaadnalgnaipvqidxigaaslpxdmgrvlehyagsgfaqlalynaasgkiek 

mimnuMiiiHnii iimii iiiiiiimii»mhiiiimmihii 
s^^n^nai^avpvqidli(^slpgdmgrvi^hyagsgfaqialynaasgkiek 

130 140 150 160 170 180 



160 



170 



230 



240 



190 200 210 220 

sinphkldqpfpgkarwekiqqagsvrdxesiggvlyaxgwlsaxthngrdyalffrqpv 

mt^MI HMMIIMIrlMUl I I M I 1 M I Mill I I I I I I I I I 1 1 1 Ml 
sinph^dqpfpgkarwekiqragsvrdlesiggvlyaqgwlsagthngrdyalffrqpv 

190 200 210 220 230 



220 



240 



250 



260 



270 



280 



290 



300 
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or f 64 a pep PKGVAEDAVLIEKARAXXXXLSYSKKGLQTFF1ATLLIASL1^IF1JU#VMALYFARRFVE 
H I f I M J I 1 1 | | i i i 1 I 1 I I I t I I ! | i | t 1 1 | 1 1 | | I I 1 I I I I 1 1 I 1 1 I t I I I 1 I I I 
orf64-l PKGVAEDAVL I EKARAKY AE LS Y SKKG LQT FFLAT L L I AS LL S I FLALVMAL Y FARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 64a . pep PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I 1 I I 1 I I t I I I 1 I I t 1 1 I 1 t t 1 I ! I t 1 1 I I I I I 1 I I I 1 I I t I I I I 1 I I I t I I J t I I 1 I I 1 
orf 64 - 1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf 64a pep RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
| ! I I M I 1 1 1 1 1 I I I 1 I I I I I M 1 1 1 1 11 I 1 1 1 1 1 1 1 1 1 I I 1 M 1 1 1 1 1 1 11 I I I I 1 1 1 1 
orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf64a pep AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQK 
| 1 I 1 I I t ) 1 I t I I I 1 ( I 1 1 I I 1 I i I 1 1 I 1 I t t ! 1 I I t I I I I 1 III I I I I I II II II I I I 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64a . pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 
I I I ! I 1 1 I I I 1 I I I I I 1 I I I 1 1 I 1 I I i I I I I I 1 1 1 1 K I 1 1 t I I I I 1 I I 1 I I = 1 I I 1 1 1 1 
orf 64-1 EAAWGEVAKRLAHE I RN PLT PI QLS AERLAWKLGGKLDEQDAQI LTRSTDT I VKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf64a pep EMVEAFRNYXRS PSXQLENQDLNAL I GDVLAL YEAG PCRFAAE LAGE PLMMAADTT AMRQ 
lllllllll I I I I t | M 1 M I I I M t I I M I I I I I I I I I > I I I I I I t r 1 I I I 1 1 I I 1 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALI GDVLAL YEAG PCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64a . pep VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
t I I I I t I I I I 1 M I I I I I I I I I I I I I: I I 1 I I M I I t II I I M I I I I I I I I I I I I I i I I I 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
610 620 630 640 650 660 

• 670 680 690 700 

orf 64a .pep PAGTGLXLPWKKIIEEHGGXISLSNQDAGGAXVRIILPKTVETYAX 
IMIII II I II I I I I I I I I I I I I II I I I I I 111111111:1111 
orf 64-1 PAGTGLGLPWKKI IEEHGGRI SLSNQDAGGACVRI ILPKTVKT YAX 

670 680 690 700 

Homoloev with a predicted ORF from N. gonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 
gonorrhoeae: 

orf 64. pep 

orf64ng 

orf 64 .pep 

orf 64ng 

orf 64 .pep 

orf 64ng 

orf 64 .pep 

orf 64ng 



MRRFLPI AAI CAXXLXXGLTAATGST S S LADY FWW I VAFS AMLLLVLS AVLAR YV I LLLK 60 
| | I | | | M II I I I I I I I I I I I I I I I M M I M I: I I M II I I I II I I I I I I I I I II 

MRRFLPIAAI CAWLLYGLTAATG ST S S LAD YFWW I VS FS AMLLLVLS AVLAR YV I LLLK 60 

DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 120 

|||:1M|| || MUM llhllll: I I I 11 I I I I I I I I I I M I I I 1 I I I I 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 
| | I I I I : I I I I I !:: I I I I I I I I I I I : M I I : I I I I I I I M I I I I I I I I I I I I I I I I 

LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 179 

KS INPHKLDQPFPGKARWEKIQRAGSVRDLES IGGVLYAQGWLSAGTHXGRDYALFFRQP 240 
|||M|::1I|:I I : 1 1 : I I : : I M I : I I II 1 1 1 1 1 1 II I I I I 1 1 I 11111111111 

KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 239 
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orf 64. pep 
orf64ng 
orf 64. pep 
orf64ng 
orf 64. pep 
orf 64ng 
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VPKGVAEDAVLIEKABAKYAELS YSKKGLQT FFLATLLIASLLS I FLALVMALY FARR^V 300 

= i = = 1 1 r 1 1 1 it 1 1 1 1 1 1 1 1 1 « ■ 1 1 nun i iiii in ii Milium in 

IPENVAQDAVLIEKARAKYAELSYSKK^ 

EPVLSIAEGAKAVAQGDFSQTRPVL^DEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 

||:tlt||||||||||llimililltlll||| I I I I I I I I • ■ 1 I ■ s I I 1 I I I I I Ml 
EPILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 

ARH YLECVLEGLTTG WVFDEQGCLKT FNKAAGT 394 

i ii i in 1 1 mm in n n „ vr Ann 

ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 400 



299 



360 



359 



An ORF64ng nucleotide sequence <SEQ ID 255> was predicted to encode a protein having amino 



acid sequence <SEQ ID 256>: 



15 



20 



1 MRRFLPIAAI 

51 LARYVILLL K 

101 TINSWFGNDT 

151 GNMGSVLEHY 

201 OQTGSVRSLE 

251 IEKARAKYAE 

301 PILSLAEGAK 

351 ERNRRREEAA 



DRRNGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 



IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



RPVLRNDEFG 
LTTGVWSYP 



RLTKLFNHMT 
LSCCRTAVFS 



AKLLLVLSAV 
FGI SAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
TCHSSPLSYF* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



ATGCGCCGCT 
CGGATTGACG 
GGTGGATAGT 
TTGGCACGTT 
CGGTTCGCAG 
TACTGCCCGG 
ACGATTAATT 
CCTTAATTTG 
GCAACGCCGT 
GGCAATATGG 
GCTTGCCCTG 
CGCACCAATT 
CAGCAGACCG 
CGCGCAGGGA 
TGTTCTTCCG 
ATTGAAAAGG 
TTTGCAGACC 
TTTTTCTTGC 
CCCATTCTGT 
CAGCCAGACG 
AGCTGTTCAA 
GAACGCAACC 
GTTGGATGGG 
TGAAAACCTT 
CCCCTGTGGG 
GTCCCTGCTT 
ACAAACCGGT 
C7GGGCAAGG 
GGTGATTGAC 
GGGGTGAAGT 
CCCATCCAGC 
GGACGATCAG 
AACAGgtggc 
CGCGCCCCTT 
CGATGTTTTG 
TTGCCGGCGA 
GTGCTGCACA 
TATGCCCGAA 
TCCTGACGGT 
AATGCTTTCG 
TCTGCCTGTA 
TGAGCAATCA 
ACGGTAGAAA 



TCCTACCGAT 
GCGGCGACCG 
CTCGTTCAGC 
ATGTCATATT 
ATTGCCAAAC 
CTTGTTCCTG 
CGTGGTTCGG 
AGCAAGTCCG 
TCCCGTACAG 
GCAGTGTGCT 
TACAATGCCG 
CGACCAGCCG 
GTTCGGTTCG 
TGGTTGTCGG 
CCAGCCGATT 
CGCGGGCGAA 
TTTTTTCTGG 
GCTGGTAATG 
CGCTTGCCGA 
CGCCCCGTAT 
CCATATGACC 
GCCGGCGCGA 
TTGACTACCG 
CAACAAGGCG 
GCAGCAGCCG 
GCCGAAGTGT 
CCAGGTGGAA 
CGACGGTATT 
GACATCACCG 
GGCGAAGCGG 
TTTCCGCCGA 
GACGCGCAAA 
gGCGTTAAAA 
CGCTCAAACT 
GCCCTGTACG 
ACCGCTGATG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGT 
CTTATGCGTA 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
GCCTTTCCGG 
TTCGGCATTT 
CAACGACACC 
CACTGGATTT 
ATAGACCTCA 
GGAACACTAC 
CAAGCGGGAA 
CTTCCCGACA 
GAGTTTGGAA 
CAGGTACGCA 
CCCGAAAATG 
ATATGCCGAA 
TAACCCTGCT 
GCACTGTATT 
GGGCGCAAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GTGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TtgccgccAT 
TATGCCGCGC 
GCCCGAAGAC 
TGCTGATACG 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACGCG 
GAAATGGTCG 
GGAAAATCAG 
AAGCCGGCCC 
ATGGCGGCGG 
AAATGCCGCC 
AATCGGAAAC 
GGCAAGGGAT 
GACGGATAAG 
TCATTGGAGA 
GGGGCGTGTG 
G 



TGCGCCGTCG 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCA 
GATGTTCACG 
CCGCGCAGTT 
CACGAAGCCC 
GGCGGCAGAC 
TCGGCACCGC 
GCCGGCAGCG 
AATCGAAAAA 
AAGAACATTG 
AGCATAGGCG 
CAACGGGCGC 
TGGCACAGGA 
TTGAGTTACA 
GATTGCCTCG 
TTGCCCGCCG 
GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
CGTCACTACC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGTGCGGCG 
CGGACGATGC 
AACGGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TtcgACCGAC 
AGGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAAGGA 
CCGGCGGGAA 
ACACGGCGGC 
TCAGAATCAT 



TCCTGCTGTA 
GATTATTTCT 
GTCCGCCGTT 
ACGGCGTGTT 
CTGGTCGCCG 
TATCAACGGC 
TCGAACGCAG 
AATGCCGTCA 
CTCCCTGTCG 
GTTTTGCCCA 
AGCATCAATC 
GGAACAGATT 
GCGTATTGTA 
GATTACGCGC 
TGCCGTTCTG 
GCAAAAAAGG 
CTGCTGTCGA 
TTTCGTCGAA 
AGGGTGATTT 
CGTTTGACCA 
AGAAGCAGAC 
TCGAGTGCGT 
AAAGGCCGTT 
GCCGCTCGCC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTGAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCATCA 
CAATTACGCG 
CCTTAATCGG 
GAGGCGGAAC 
CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGACTGGG 
CGCATCAGCC 
CTTGCCAAAA 



BNSOOCID; <WO 9924578*2.1 > 
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This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 
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15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



mppPLPIAAI CAWLLYGLT AATGSTSSLA 



LARYVILLLK 
TINSWFGNUT 
GNMGSVLEHY 
QQTGSVRSLE 
IEKARAKYAE 
PILSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RAPSLKLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVETYA* 



DRRNGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDDQ 
DLNALXGDVL 
EAAEEADMPE 
PAGTGLGLPV 



IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHKGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



RPVLRNDEFG 
LTTGWVFDE 
AEVFAAIGAA 
DITVLIRAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKIIGEHGG 



RLTKLFNHMT 
KGRLKTFNKA 
AGTDKPVQVE 
EAAWGEVAKR 
TIIKQVAALK 
EAELAGEPLM 
GRIVLTVCDN 
RISLSNQDAG 



AMLLLVLSAV 
FGISAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
AEQILGMPLA 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
MAADTTAMRQ 
GKGFGKEMLH 
GACVRIILPK 



ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 
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orf64ng-l.pep 



10 20 30 40 50 60 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 

HIM UNI Mi II llllllll Mill II 1111111:1 Mill! Mill Mill II I II 
MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
10 20 30 40 50 60 

70 80 90 100 110 120 

DRRNGVFGSQIAKRLSGMFTLVAVLPGLFLFG I SAQFINGT INSWFGNDT HEALERS LHL 
| I |:!M I MIHIIM I IMM MM: I II 1:111111111 III III II I III I I! Ml 
DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 

|| || | : 1 1| || I : : I I I II I I II M : I II 1:11 I I I I M I I I I M M I M M I I I M I 
SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 
130 140 150 160 170 180 

190 200 210 220 230 240 

SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 

t | | | | : : | | I : I | : I I : I t : : I I II : I I I I II M II I I M I II I M II II I I II I M : 
SINPKKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 

250 260 270 280 290 300 

PENVAQDAVLIEKARAKYAELS YSKKGLQTFFLVTLLIASLLS I FLALVMALYFARRFVE 
| : : | I : I I I I I I I I I I I I I I I M II M M I I I I : I M M I M M M I I II II I I I M M I 
PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

PI LSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLS IAKEADERNRRREEAA 

I : || | | | II I I I I I I II I I I II II I I M I II I M I I II I I II I II M M II M I I I I I M 
PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
310 320 330 340 350 360 

370 380 390 400 410 420 

RHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 

limiMMIUIIIMIIM I II I 1 I I I I I I M I I I: IMMIIMIIMM 

RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
370 380 390 400 410 420 

430 440 450 460 470 480 

AEVFAAIGAAAGTDKPVQVEYAAPDDAKI LLGKATVLPEDNGNGWMVI DDITVLIRAQK 
|| | | | | | | II I I I I I I I :|: II I II M II I I I I I II I I I I I I M M M I I I II I II: I M 
AEVFAMGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGVVMVIDDITVLIHAQK 
430 440. 450 460 470 480 

490 500 510 520 530 540 

EAAWGEVAKRLAHEI RNPLT PIQLSAERLAWKLGGKLDDQDAQILTRSTDTI IKQVAALK 
1 1 I 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 I I I I I I IIMIMMM:MIMIIIMMI:IIIIIM 
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EAAWGEVAKRLAHEIRNPLTPIQLSAERXAWKLGGKLDEQDAQILTRSTDTIWQVAALK 
4 90 500 510 520 530 540 

550 560 570 580 590 6°° 

EMVE^FRNYARAPSLKI^NQDLNALIGDVLALYEAGPCRFEAELAGEPLMMAADTTAMRQ 

| 1 1 | I I | I | t | r I I I I I 1 t 1 1 I I I I 1 1 I | | | J | | 1 I 1 I I I IIIHIII : I I I I I M I I 
EMVEAFRNYARS P S LKLENQDLN AL I GDVLALYE AG PCRFAAELAGE PLT VAADTT AMRQ 
550 560 570 580 590 600 

610 620 630 640 650 660 

VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 

| Mill IIMIIII I 11:1111111111 I I I lllll !) M I I 111:111111111 III M 
VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

670 680 690 700 

PAGTGLGLPVVKKIIGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
Ml Ml | Ml 111 I I I I I I t I I I I I I I I I I I i I I I I I M I I «* I I I I 
PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 

670 680 690 700 



Furthermore, ORF64ng-l shows significant homology to a protein from A.caulinodans: 

SDIQ0485OINTRY AZOCA NITROGEN REGULATION PROTEIN NTRY >gi I 77479 |pir I I S18624 ntrY 
protein - Azorhizobium caulinodans >gil38737 (X63841) NtrY gene product 
[ Azorhizobium caulinodans) Length « 771 
Score = 218 bits (550), Expect = 7e-56 

Identities - 195/720 (27%), Positives = 320/720 (44%), Gaps - 58/720 (8%) 
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Sbjct: 
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IAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 
I+A+ ++L GLT++ + R++KRG 

ISALATFLILMGLTPWPTHQWIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 



126 



FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 

+++ R+ G+F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 
AAARLHIRIVGLFAWSWPAILVAWASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 150 

127 LAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAG — SGFAQLALYNAASGKIEKSINP 184 
AN+ + +DL S+ YGSFQ+ AA+++ 



151 EHALNIRGDILAMSADLTRLKSV- 



- YEGDRSRFNQI LTAQAALRNLPGAMLI 200 
233 



185 HQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYA 

+ D+ +++ i+ v + +IG Q + N DY 

201 RR-DLSWERAN-VNIGREFIVPANLAIGDATPDQFVIYLP-- NDADYVAAWPLKDYDD 256 

234 — LFFRQP I PENVAQDAVLIEKARAKYAELS YSKKGLQT FFLVTXXXXXXXXXXXXXVMA 291 

L++IV ++AYL+ G+Q F + + 

257 LYLYVARLIDPRVIGYLKTTQETLADYRSLEERRFGVQVAFALMYAVITLIVLLSAVWLG 316 

Query: 292 LYFARRFVEPILSLAEGAKAVAQGDFSQTRPVLRND-EFGRLTKLFNHMTEQLSIXXXXX 350 

L F++ V PI L A VA+G+ P+ R + + L + FN MT +L 

Sbjct: 317 LNFSKWLVAPIRRLMSAADHVAEGNLDVRVPIYIIAEGDLASLAETFIJKMTHEIJISQREA^ 376 

Query: 351 XXXXXXXXXXXHYIZCVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGW 410 

+ E VL G+ GV+ D + R+ N++AE++LG L+ + RH 
Sbjct: 377 LTARDQI DSRRRFTEAVLSGVGAGV IGLDSQERI T I LNRSAERLLG — LSEVEALHRHLA 4 34 

Query: 411 HGVSAQQSLLAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 467 

V LL E + VQ D + + V E + +G V+ 

Sbjct: 4 35 EWPETAGLLEEA EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWW 4 88 

Query: 4 68 VIDDITVLIRAQKEAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTR 527 

+DDIT LI AQ+ +AW +VA-rR+AHEI +NPLTP IQLSAERL KG + QD -i-I + 
Sbjct: 489 TLDDITELISAQRTSAWADVARRIAHEIKNPLTPIQLSAERLKRKFGRHV-TQDREIFDQ 547 

Query: 528 STDTIIKQVAAIJCEMVEAFTlNYARAPSUaXNQDLNALIGDVLALYEAGPCRFEAELAGE 587 

TDTII+QV + MV+ F ++AR P +++QD++ +1 + L G + 
Sbjct: 54 8 CTDTIIRQVGDIGRMVDEFSSFARMPKPWDSQDMSEIIRQTVFLMRVGHPEWFDSEVP 607 

Query: 588 PLMMAA- DTT AMRQVLHN I FKNXXXXXXXXDMPEVRVK SETGQDGRIVLTVCD 639 

P M A D + Q L NI KN P+VR + + G+D +V+ + D 

Sbjct: 608 PAMPARFDRRLVSQALTNILKNAAEAIEAVP-PDVRGQGRIRVSANRVGED — LVIDIID 664 
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Ouerv 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPWKKIIGEHGGRISLSNQDAG-GACVRIIL 698 
Qaery - m G +E + EPYVT + GTGLGL +V KI+ EHGG I L ++ G GA +R+ L 

Sbjct: 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 

5 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and ^gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

10 Example 31 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

15 151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

20 401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

4 51 CACGCGTTGG ATACG. . . 

This corresponds to the amino acid sequence <SEQ ED 260; ORF66>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 
25 101 LSEFNTFVGR IALASFAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 

151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 26 1>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

-1A 10 l CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

JV l5 i TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

35 35i CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

451 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

40 601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 MYA FTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFS FP 

51 FTPTATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

45 10 l LSEFNTFVGR I ALASFAAYA IGQILDIFV F NKLRRLKAWW IAPTASTVIG 

151 NALDTLVFFA VAF YA5SDGF MAANWQGIAF VDYLFKLTVC TLFFLPAYGV 

201 I LNLL TKKLT"TLQTKQAQDR PAPSLQNP* 

Computer analysis of this amino acid sequence gave the following results: 
Homolog y with the hypothetical protein o221 o f E. coli (accession number P37619) 
50 ORF66 and o221 protein show 67% aa identity in 155aa overlap: 
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orf66 1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

M F+ Q+ KALF L LFH+L+I +SNYLVQ P I G HTTWGAFSFPFIFLATDLTV 
0 221 1 MNV FS QTQR YKAL FWLS L FH LLV I T S SN YL VQLP V S I LG FHTTWGAF S FP F I FLAT DLT V 60 

5 orf66 61 RI FGSHLARRI I FWVMFPALLLS YVFSVLFHNGSWTGIX^LSEFNTFVGRIALASFAAYA 120 

RIFG+ LARRUF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
o221 61 RIFGAPLARRHFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

orf66 121 IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 
10 +GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 

o221 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of N. 

15 meningitidis: 

10 20 30 40 50 60 

o r f 6 6 . pep MYAFTAAQQQKALFRLVLFHI LI I AASNYLVQ FPFQI FG I HTTWGAFS FPFI FLAT DLT V 

iTTrrriTTTTTTi iiiiiiiiiimii ii 1 1 ii i iiinniMi i 1 1 1 1 ! 1 1 1 1 1 

orf 66a MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQI SGI HTTWGAFS FPFIFLATDLTV 

20 10 20 30 T6 50 60 

70 80 90 100 110 120 

orf 66 . pep RIFGSHLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 # 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 i i 

25 orf 66a RIFGSHLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 

70 80 90 100 110 120 

130 140 150 

orf 66 . pep I GQI LDI FV FNKLRRLKAWWIAPNAS TVIGHALDT 
30 ~ : I I 1 ! I i 1 I I t I I 1 I I I I I I • I I : I I I i 1 I : I I I I 

o r f 6 6 a LGQILDIFV FNKLRRLKAWWVAPTAS TVIGNALDTLVFFAVAF YAS S DG FMAANWQG I AF 

130 140 150 160 170 180 

orf 66a VDYLFKLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 
35 190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

40 151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

45 401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

50 651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ED 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 FI FLAT DLT V RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA LGQILDIFV F NKLRRLKAWW VAPTAS TVIG 

55 151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

2C1 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 

10 20 30 40 50 60 

orf 66a . pep MYAFTAAQQQKALFWLVLFHILIIAASNYLVQ FPFQI SGI HTTWGAFS FPFI FLAT DLT V 
60 ^ 1 I I I I t ! I I I 1 t I I I I I 1 1 I I I 1 1 1 1 1 I II I I I II I I II I I II M I I I I I I II I I I I I 

orf 66-1 MYAFTAAQQQKALFRLVLFHI LI IAASNYLVQ FPFQ I FG I HTTWGAFS FPFI FLAT DLT V 
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orf 66a. pep 
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70 80 90 100 H° 120 

-7ft 80 qo ioo no A * u 



130 140 150 160 170 1*0 

TTTl 1 1 1 1 1 1 1 M I 1 1 1 1 1 I : I I II M I 1 1 IH M 1 1 1 1 1 1 1 M I 1 1 MM I I Mil 1 1 1 
IGOI LDI FVFNKLRRLKAWWI APT ASTVIGN ALDTLVFFAVAFYAS S DGFMAANWQGIAF 
130 140 150 160 170 180 

190 200 210 220 229 

VDYLFKLT VCGLFFLPAYGVILNLLTKKLTTLQTKQAQDR PAPS LQN PX 
MIlMllll 1 t I I I I M I I I M I t I 1 I 1 1 1 1 1 M M I I I M 1 M I I 1 
VOYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 



Homology with a predicted ORF fro m N gonorrhoeae 

ORF66shows 94.2% identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 



25 



30 



gonorrhoeae: 

orf 66. pep 

orf66ng 

orf 66. pep 

orf66ng 

orf66.pep 

orf66ng 



MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FGI HTTWGAFSFPFI FLATDLTV 
III" I U Ml I I I I I I I M II M I I M I I M I I 1 1 M M I I I 11 I I I I I II II I M I 1 1 I 
llrf^T AAQWK^E^VLFH ILI IAASN Y LVQFPFR I FGI HTTWGAFS F P FI FLAT DLTV 

RIFGSHIARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 
I H || tm | | Mill MM I I I I I I I 1 1 I 1 I M 1 1 1 1 ! 1:11111111111 IMIII 
R I FGSHLARRI I FWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNT FVGRI ALASFAAYA 

IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 
- i I | | | | | | | : | | | 1 I II II I II M M M : II M 

LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 



60 



60 



120 



120 



155 



180 



35 The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 



40 



45 



1 ATGTACGCAT 

51 GCTTTTCCAT 

101 CCTTCCGGAT 

151 TTCATCTTCC 

201 GGCGCGGCGG 

251 aCGTCTTTTC 

301 ctgTCCCAAT 

351 CGCCTACGCG 

401 GCCGTCTGAA 

451 AATGCACTGG 

501 CGATGAATTT 

551 TGTTCAAACT 

601 ATACTGAATC 

651 GCAAGACCGC 



TGACCGCCGC 

ATCCTCATCA 

TTTCGGCATC 

TCGCCACCGA 

ATTATCTTTT 

CGTTTTGTTC 

TCAACACCTT 

CTCGGACAAA 

AGCGTGGTGG 

ACACGTTAGT 

ATGGCGGCAA 

TACCGTCTGC 

TGCTGACGAA 

CCCGTGCCCT 



ACAGCAACAG 
TCGCCGCCAG 
CACACCACTT 
CCTGACCGTC 
GGGTGATGTT 
CACAACGGCA 
TGTCGGACGC 
TCCTTGATAT 
ATTGCCCCGG 
ATTTTTTGCC 
ACTGGCAGGG 
ACCCTCTTCT 
AAAACTGACG 
CGCTGCAAAA 



AAGGCACTCT 

CAACTATCTG 

GGGGCGCGTT 

CGCATTTTCG 

CCCCGCCCTT 

GTTGGACGGG 

ATCGCGCTGG 

TTTCGTATTC 

CCGCATCAAC 

GTTGCCTTTT 

CATCGCTTTT 

TCCTGCCCGC 

GCCCTGCAAA 

TCCGTAA 



TCCGGCTGGT 
GTGCAGTTCC 
TTCCTTTCCC 
GTTCGCACTT 
ttgCTTTcat 
CTTGGGCGCG 
CAAGTTTTGC 
GACAAATTAC 
CGTCATCGGC 
ACGCAAGCAG 
GTCGATTACC 
CTACGGCGTG 
CCAAACAGGC 



50 This encodes a protein having amino acid sequence <SEQ ID 266>: 



55 



1 MYALTAAQQQ KALFRLVLFH 

51 FI FLATDLTV R IFGSHLARR 

101 PS OFNTFVGR I ALASFAAYA 

151 NALDTLVFFA VA FYASSDSF 

201 ILNLLTKKLT ALQTKQAQDR 

An alternative annotated sequence is: 



ILIIAASNYL VQFPFRIFGI HTTW GAFS FP 
IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 
LGQILDIFVF DKLRRLKAWW IAPA ASTVIG 
"MAANWQG IA F VDYLFKLTVC TL FFLPAYGV 
PVPSLQNP* 



60 



1 MYALTAAOOQ KALFRLVLFH ILIIAASNYL VQFPFRIFG I HTTWGAFS FP 

51 PTFT.&TnT.TV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSQFNTFVGR I ALASFAAYA LGQILDIFV F DKLRRLKAWW IAPAASTVIG 

15 1 HALDTLVFFA VAF YASSDEF MAANWQG IAF VDYLFKLTVC TLFFLPAYGV 

201 ILNLL TKKLT ALQTKQAQDR PVPSLQNP* 



I 
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ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 

orf 66-1 . pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWC^FSFPFIFIATDLTV 60 

I II :| II l| I Ml Ml II IIIM II Ml III MM:) I M I 1 I 1 | I I I I I II I I I _ 

orf66ng myaltaaqqqkalfrlvlfhiliiaasnylvqfpfrifgihttwgafsfpfiflatdltv 60 

orf 66-1. pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTF7GRIALASFAAYA 120 
I I t I I I I I II I M I M I 1 M I M 1 1 II M I I I I I I I I M M I : I I I I I M I M M I I I I I _ 
or f 6 6ng RI FGSHLARRI I FWVMFPALLLS YVFSVLFHNGSWTGLGALSQFNTFVGRI ALAS FAAYA 120 

10 orf 66-1. pep IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 180 

: I II I I I I M : M II 1 I I M I H : I M M 1 I I I M I I I I I I M II I I I II I M I M I I I 
orf66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

orf 66-1 .pep VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 229 

15 I I II I I I I I I M I i M I I I M I I I I I I I M: I I I I II I I I I ' H I I I I I 

orf66ng VD YL FKLTVCT LF FL P AYGVI LN LLT KKLT ALQTKQAQ DR P V PS LQN PX 229 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 

sp|P37 619|YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 
REGION (0221) 

20 >gi|1073495lpir! IS47690 hypothetical protein o221 - Escherichia coli >gi|466607 

(000039) No definition line found [Escherichia coli) >gi 1 1789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region (Escherichia coli] 
Length « 221 
Score = 273 bits (692), Expect » 5e-73 

25 Identities = 132/203 (65%), Positives - 155/203 (76%) 
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Query: 1 MYALTAAQQQKALFRLVLFHI LI IAASNYLVQFPFRI FGIHTTWGAFSFPFI FLATDLTV 60 

M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPF I FLATDLTV 
Sbjct: 1 MNVFSQTQRYKALFWLSLFKLLVITSSNYLVQLPVSILGFHTTWGAFSFPFI FLATDLTV 60 

Query: 61 RI FGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRI ALAS FAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
Sbjct: 61 RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

35 Query: 121 LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 
Sbjct: 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 180 

Query: 181 VDYLFKLTVCTLFFLPAYGVILN 203 
40 VDY FK+ + +FFLP YGV+LN 

Sbjct: 181 VDYCFKVLI S I VFFLPMYGVLLN 203 

Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
45 N. meningitidis and Kgonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 32 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 267>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

50 51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

55 301 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 

351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 

451 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 
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501 TGGCTGCTAC GGCGTTGAT. . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

l MVIKYTNLNF AKLSIIAILM MVS FE AN AN A VXISETVSVD TGQ^IHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

5 101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT . 

i 0 101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

, c 35 x CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 

1 MVIK YTNLNF AKLSIIAILM MYSFEANAN A VKISETVSVD TGQGAKIHKF 
90 51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

Computer analysis of this amino acid sequence gave the following results: 
Homolo gy with a predicted ORF from N. m eningitidis f strain A) 
25 ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf72 oep K^TyV TNLNFAKLSIIAILMMYSFEANAN AVXISETVSVDTGQGAKIHKFVPKNSKTYSS 

0 pp mi hi limn iiminii ii i i i m in i ii i mm Minimum i 

30 orf72a MVIKYTNLNFAKLSI IAI LMMYS FEANA NAVKI SETVSVDTGQGAKIHKFVPKNSKTYSS 

Yo 20 ^~30 40 50 60 

70 80 90 100 110 120 

orf72 Pep DLIKTVDLTHXPTGAKARIRAKITASVSRAGVIJVGVGKLARI^AKFSTRAVPYVGTALLA 

35 1 P P nilllltll I I I I HI III III I I IN II I I Ml III ! Ml I IN I Ml I I I II I I I I 

3J orf72a DLIKTVDLTH I PTGAKARINAKITASVSRAGVLAGVGKLAKLGAKFSTRAVPYVGTALLA 

70 80 90 100 HO 120 

130 140 150 160 170 

40 orf72 pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

| I I II I HI I Ml I I Ml Ml Ml I : I 
orf72a HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 

The complete length ORF72a nucleotide sequence <SEQ ID 271> is: 

45 i ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

50 251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

55 This encodes a protein having amino acid sequence <SEQ ID 272>: 
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MVIKYTNLNF &vt c j j^tt-M MYSFEANANA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 



1 
51 



151 



ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 
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10 20 30 40 SO 60 

0-f72a peD HVIKYTNLNFAKLSIIAILMKYSFEANANAVKISETVSVDTGQGAKIHKEVPKNSKTYSS 
I I I I t I I I I I I i | | | I | I I I I I I M I M I I I I I I I I I I } I I I I i I 1 I t I I I ! I M I I I I I 
o-f72-l MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 10 20 30 40 50 60 

70 80 90 100 110 120 

or*72a pep DL I KT VD LT H I PTG AKAR I N AK IT AS V SRAG V LAG VGKLARLG AK FSTRAV P Y VGT AL LA 

I I I i M I I I I M I I I I I ! I I I I I I I I i I I I I M t I I I ) I M I I I I M M I I ! I I I I I t M 
1 5 or^72-l DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 

orf 72a . pep HDVYETFKED1QARGYQYDPETDKFAKVSGX 
20 I I I I I 1 I I I I I I I I I I I I I M I I I I I I 11 I I 

orf 72-1 HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 

Homology with a predicted ORF from N. gonorrhoeae 
25 ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 
gonorrhoeae: 

orf 72 .pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

II I : I I I I II I I I I I I I I I I I ! I I I I II I I I I I I : I I I I I I I I I : I I I I I I : I : Ml 
orf72ng MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 60 

orf72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVIAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II 1:11111 I II I I I I I I I I I I I I I I I I I I I : I I I I I : I I I I I : I I I I I I I I I I I I I 
O r f 7 2 n g DLTKAVDLTH I PTG AKAR INAKI T AS VSRAGVLSGVGKLVRQG AK FGTRAV PYVGTALLA 120 

35 orf 72 . peo KDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 

! I I I I I I I I I ! I 1 I I : I I I I I I I I I I I I I I : II I I I I I : I I I I I I I i I I I I I 
orf72ng HDVYETFKEDIQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

40 1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

45 251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP. GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA ATFGRDAQGN TTADVQVI PR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

50 501 EKIR FAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTG? CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

55 151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

60 401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 
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This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 

5 ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

orf72nq-l.pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 
1} |:UI||!tllltilMIIIIIIMIIIIIIIt:llli!IHI:IIIMl:t: ill 
orf72-l MVIKYTNI^FAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 * 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72nq-l . pe DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 
II llimiMMII I i I 1 I 111111:11111:1 II II : I I I I II I I I I I I I 

1 5 orf 72-1 DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 
orf72ng-l.pe HDVYETFKEDIQARGCRYDPETDKF 
20 I I I I I I I I I I I I II I : M I I M II 

orf 72-1 HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
25 domains in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
^.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 277>: 

30 1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 GCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC . . 

35 This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

40 51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

45 301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

401 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

50 1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 

51 T.SCT.T.I MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVIAVLLL 
101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 
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151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Hnmnlnp y with a predicted QPF from N. meningitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of//. 
5 meningitidis: 

10 20 30 40 50 60 

orf 73 .pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFA AGVLMLRQTGLTGLLLAGAA 
I I I I I I I | | | | I | I I I I I I I I I 1 I I I I I ! I I I I I I M i I I I I : I I I : I I t : I I I M I I I 
orf 7 3a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFA AGWMLRHTGLSGLLLAGAA 
10 10 20 30 40 50 60 

70 

orf 7 3. pep MRSGGKVSVYQMLWPI 
I I II I : I I I I ill I 

15 orf 7 3a MRSGGRVSVYXMLWXIRYTVAAV CXMSPGFVSSVXAVLLXL PFKGGAVLQAGGAENFFNM 

The complete length ORF73a nucleotide sequence <SEQ ID 28 1> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

20 151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

25 401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

451 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ID 282>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGWMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

30 101 LPFKGGAVLQ AGGAEN FFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 

10 20 30 40 50 60 

or f 7 3a . pep MRFFG I G FLVLL FLE IMS I VWVADWLGGGWT L FLMAAT FAAG WMLRHTG LSG LLLAG AA 
35 ' I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I H I I I : I I I I I I I I I I I 11 I I I 

orf 7 3-1 MRF FG I G FLVLL FLE I M S I VWV ADW LGGGWTLFLMAAG FAAG VLMLRHTG L SG LL LAG AA 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 73a . pep MRSGGRVSVYXKLWXIRYTVAAVCXMSPGFVSSVXAVLLXLPFKGGAVLQAGGAENFFNM 

I I I i 1 I I II I III I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf73-l MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
70 80 90 100 110 120 

45 130 140 150 160 

orf 73a .peo NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
I M I I t I I I II M I I I I I I I I I I I I ill I I I I t I i 
orf 73-1 NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
130 140 150 160 

50 

Homology with a predicted ORF from ^gonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 

orf 73 . pep MRFFGIGFLVLLFI£IMSIVWVADWIX5GGWTLFLMAAGFAAGV1^LRQTGLTGLL1AGA^ 60 
55 I I 1 I I I I I I I I I M I I I M I I M I I I I I I I I II I I I I I I I I M I I I : I I I : I I I I I I M 
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or f 7 3ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLM 60 

orf73.pep MRSGGKVSVYQMLWPI l€ 
5 orf73ng ^UiwsWQ^ 120 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

in 151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

15 401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

451 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 

1 MRFFGIGFL V LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 LSGLLLAGAfl wsggicvRW OM1.WPIRYTV AAVC LMSPGF VSSVLAVLLL 

20 101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 

10 20 30 40 50 60 

orf7 3-l oeD MRFFGIGFLVLLFI£IMSIVWVADWLGGGWTLFLMAAGF7^GVLMLRHTGLSGLI*LAGAA 

25 MIIIIIIIIIMIIMMMM1M IM11IIMM ^ 1 1 1 1 1 1 1 1 1 1 1 1 M I I Ml I 

orf73nq MRFFGIGFLVLLFLEIMSIWA/ADWLGGGWTLFLMAAT FAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

10 orf7 3-l Deo MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

orf73na vKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
70 80 90 100 110 120 

35 130 140 150 160 

orf 73-1 . pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
t | | | 1 1 1 1 I : I I I I I I ) I 1 1 1 1 : 1 I I I 1 1 t 1 1 I I 1 : 1 I 1 1 
orf73nq N QSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 
130 140 150 160 

40 Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
^meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 34 

45 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 285>: 

1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

50 201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

351 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

401 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

55 451 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 
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501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

5 701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT. . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 

10 1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

15 251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

20 151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG AT7GTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

25 401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

30 651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

35 This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFKV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

40 201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKK K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of TV. 
45 meningitidis: 

10 20 30 40 50 60 

orf 75 . pep MFVFQTAFXMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKAXXXXAEDTR 

II I i I I I I ! I I 1 I I I I I I I I I I I I I I I I I II I I I I I I I I i I I Mill 
orf 75a MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 
50 10 20 30 40 50 

70 80 90 100 110 120 

or f 7 5 . pep VTAQLLSAYG I QGKLVS VREHNE RQMADK I VG YLS DGMWAQV S DAGT P AVCD PG AKLAR 
I || I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I II I I I II 
55 orf 75a VTAQLLSAYG I QGKLVSVREHNERQMADKI VG YLS DGMWAQVS DAGT PAVC DPGAKLAR 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 7 5. pep R VREAG FK W P WGAXAVMAAL S V A GVEG SDFYFNGFVPPKS GERRKLFAKWVRAAF P IV 
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10 



15 



orf75a 

or f 7 5. pep 
orf75a 

or f 7 5. pep 
orf75a 

orf75a 



IH| :m , m iH t I I I I I i 1 I I I 1 M I 1 1 1 1 I M I 1 I I I 1 I I 1 1 1 1 1 1 1 r I I I - " 
RVREVGFKWPWGASAVMAALSWGVAGSDF^ 

120 130 140 150 160 170 



230 



240 



190 200 210 220 

MFETPHRIGAAIAIM^LFPERRI^U^ITKTFETFLSGTVGEIQTALSADGDQSRGEM 

|| | || |: M | til II Ml II M III II M Ml I!' > MI 1 IHIhl II: I I I IN 
MFTTPHRIGATIADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 

180 190 200 210 220 



230 



250 260 270 280 290 

VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 

I I I I I I I I I I I 1 | I II M I M I II II 11 M I M M M M M M M it II M I 
VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 

240 250 260 270 280 290 



The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 



20 



25 



30 



35 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACCG 
CAGCGTGCGC 
ATCTTTCAGA 
GCCGTGTGCG 
GTTTAAAGTT 
GTGTGGCTGG 
CCGAAATCGG 
GTTTCCCGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAACATCA 
GGAGCTTGCC 
TGGCACTGTC 



AACATTTGCA 

GTTGCCACGC 

GGTATTGCAA 

CGCAGCTTTT 

GAACACAACG 

CGGCATGGTT 

ACCCGGGCGC 

GTCCCTGTTG 

TGTGGCGGGA 

GCGAACGTAG 

GTGATGTTTG 

GGAACTGTTC 

CGTTTGAAAC 

GCGGCGGACG 

GGCGCAGGAT 

TGAAAATCCT 

GCCAAAATCA 

TTGGAAAAAC 



GAAAGCCTCC 

CCATCGGCAA 

AAGGCGGACA 

GAGCGCGTAC 

AACGGCAGAT 

GTGGCACAGG 

GAAACTCGCC 

TCGGCGCAAG 

TCCGATTTTT 

GAAATTGTTT 

AAACGCCGCA 

CCCGAACGCC 

GTTCTTAAGC 

GCAACCAATC 

GAAAAACACG 

CACAGCCGAG 

CGGGCGAGGG 

AAATGA 



GACAGCGTCG 

TTTGGCGGAC 

TCATCTGTGC 

GGCATTCAGG 

GGCGGACAAG 

TTTCCGATGC 

CGCCGCGTGC 

CGCGGTGATG 

ATTTCAACGG 

GCCAAATGGG 

CCGCATCGGG 

GATTAATGCT 

GGCACGGTTG 

GCGCGGCGAG 

AAGGCTTGTC 

CTGCCGACCA 

AAAAAAAGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAAACTCGT 
ATTGTCGGCT 
GGGTACGCCG 
GTGAGGTCGG 
GCGGCTTTGA 
TTTTGTACCG 
TGCGGGTGGC 
GCGACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCCGCG 
AACAGGCGGC 
TTGTACGATC 



This encodes a protein having amino acid sequence <SEQ ID 290>: 



40 



1 MFQKHLQKAS 

51 RVTAQLLSAY 

101 AVCDPGAKLA 

151 PKSGERRKLF 

201 ITKTFETFLS 

251 QNIMKILTAE 



DSWGGTLYV 
GIQGKLVSVR 
RRVREVGFKV 
AKWVRVAFPV 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPWGASAVM 



ITLRALAVLQ 
IVGYLSDGMV 
AALSVAGVAG 



VMFETPHRIG 
AADGNQSRGE 
AKITGEGKKA 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



KADI I CAE DT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 



45 



50 



55 



60 



65 



orf75a.pep 
orf75-l 



orf75a.pep 
orf75-l 



or f 7 5a. pep 
orf75-l 



10 20 30 40 50 60 

MFQKHLQKASDSWGGTLYWATP1GNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
I I I , | 1 1 1 1 1 M M M M II I I M I M II M M I M II M M M II II II M 1 1 M M M 
MFQKHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 . 30 40 50 60 

70 80 90 100 110 120 

G IQGKLVSVREHNERQMADKI VGYLS DGMWAQVS DAGT PAVCDPGAKLARRVREVGFKV 
IIIIIIMIIIIIIMMIMMIMMMMMIMMIMMIIMMMIIMMM 
GIOGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 



130 140 150 160 170 180 

VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPVVMFETPHRIG 

I I I I I I | I I I I I I I I I I I I I I I 1 I I I M I I I II M II M I M M : I M : M I M M M I 
VPWGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKVfVRAAFP I VMFETPHRIG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf75a Dep m ATIADMAELFPERRI^LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
° r P P , , , , , , , , 1 1 1 M I M II I I M I I M II I I M II I I M I I : I M I II II M I I M I I M I 



I 
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or f 7 5a. pep 



250 260 270 280 290 

EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDIALSWI^KX 

1 1 I 1 1 M | I I | 1 1 1 1 1 1 1 1 1 I I M U M M I M 1 1 1 1 1 1 1 I I M I I I jUll 

EKHEGLSESAQNIMKILT 

250 260 270 28° 

10 Homology with a predicted QRF from N^nnnrrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from N. 

gonorrhoeae: 

orf75.pep MEVFQTAFXMFQKHLQKASDSW^^ 56 
ori/D.pep i , | | 1 I , 1 | | | 1 M I I t I t I I I I II 1 I I 1 I t 1 1 M I I t I U I Mill 

15 orf75ng msVFQTAFFMFQK^ 

orf 7 5 . pep VTAQIXSAYGIQGKLVSVREHNERQMADKIVGYLSDGMW 

111 I I II I I I II I: I II Ml !1 I I I M I I:: !: lit l: I I M I I I I II II I I I I I I I Nl 

VTAQLLSAYGIQGRLVSTO 120 
20 - -axaVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 176 

180 

236 

240 



6C 
116 



orf75ng 

orf75.pep HVREAG^VPWGA. | | | | | | | | I I I 1 I I M I I ! I I I I I M I IK 

orf 7 Sng RVREAGF7CWPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVBAAFPVV 

25 orf 7 5. pep MFETPHRIGAAIADMAELFPERRl^IAREITKTFETFLSG^ 



orf75ng 
orf75.pep 



MFETPKRIGATliD^LFPERRI^l^ITKTFETFLSGTVGEIQTALAADGNQSRGEM 

VLVLYPAQDEKHEGLSESAONIMKILTAELPTKQAAELAAKITGEGKKALYD 288 

•JH III MM III I I II III I II I I! II: I I II II I I III II I llll I 

30 orf75ng vLVLYPAqJ>E^EGLSESAQNAMKILAAELPTC 300 

An ORF75ng nucleotide sequence <SEQ ID 29l> was predicted to encode a protein having amino 
acid sequence <SEQ ID 292>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

-l< 51 ADIICAEDTR VTAOLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

35 101 AOVSDMTPA VCDPGAKLAR PWGASAVMA ALSVAGVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLARE1 TKTETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

III KHEGLSESAQ NaAkILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

40 30i • 

After further analysis, the following gonococcal DNA sequence <SEQ ID 293> was identified: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

A< isi CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

45 2ol CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

50 401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

>U 451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

« ll\ gaSgcattg gcggcggacg gcaaccaatc gcgcggcgag atggtgttgg 

55 tgctttatcc ggcgcaggat gaaaaacacg aaggcttgtc cgagtctgcg 

751 caaaatgcga tgaaaatcct tgcggccgag ctgccgacca agcaggcggc 

III GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 
851 TGGCACTGTC GTGGAAAAAC AAATGA 

60 This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>: 
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1 MFQKHLQKAS 

51 RVTAQLLSAY 

101 AVCDPGAKLA 

151 PKSGERRKLF 

201 ITKTFETFLS 

251 QNAMKILAAE 



DSWGGTLYV 
GIQGRLVSVR 
RRVREAGFKV 
AKWVRAAFPV 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPWGASAVM 



VMFETPHRIG 
AADGNQSRGE 
AKITGEGKKA 



ITLRALAVLQ 
VIGFLSDGLV 
AALSVAGVAE 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



KADI I CAE DT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orf75-l .pep 
orf75ng-l 



orf75-l.pep 
orf75ng-l 



orf75-l .pep 
orf75ng-l 



orf75-l.pep 
orf75ng-l 



orf75-l.pep 
orf75ng-l 



10 20 30 40 50 60 

MFQKHLQKAS OS WGGT LYWATP IGNLADI T LRALAVLQKADI I CAEDTRVTAQLLSAY 
1 I f I I I I I t I f I I | I I I I J I I I I I I I 1 I I I 1 I t I I 1 I I 1 I I I I I I I I 1 I i I I t I I I I I I 1 
MFQKHLQKAS DS WGGTLYWAT P IGNLADI T LRALAVLQKADI I CAEDTRVTAQLLSAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
I I M : 1 1 1 I I 1 I I I t I 1 ! t I : : 1 1 I 1 1 I t I I I I { | | | | t I I ( ! I ! I i I 1 I I 1 I t 1 1 1 I t 1 
GIQGRLVSVREHNERQMADKVIGFLS DGLWAQVSDAGT PAVCDPGAKLARRVREAGFKV 
70 80 90 100 110 120 

130 140 150 160 170 180 

VPWGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 

I I I I I I I I I I I I I I I I II I I I 1 I I II I II I I I I I II I I I I I I I I I I I : I I I I I II M I 
VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVVMFETPHRIG 
130 140 150 160 170 180 



190 200 210 220 230 240 

^TLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
! I 1 I I I I I I I I I I I I t I 1 I I I I I I I I II ! I I t I! I I I I I : I It M I I t I I I I I 1 I I I I I 
^TLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
190 200 210 220 230 240 

I 250 260 270 280 290 

5KHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

j I I I | I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I i 1 I I I I I I I I I I I I 1 
EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
250 260 270 280 290 



Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

sp 1 P4 552 8 I YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi 1 606086 (U18997) 0RF_f286 (Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli J Length = 286 
Score « 218 bits <550) , Expect « 3e-56 

Identities « 128/284 (45%), Positives « 171/284 (60%), Gaps « 4/284 (1%) 



Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



64 



60 



KH LQKAS DS WGGTLYWAT P I GNLAD IT LRALAVLQKAD 1 1 CAE DTR VT AQ LLS A YG I Q 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 



124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 17 9 

184 ADMAELFPERR-I^LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

180 EDIVAVLGESRYVVl*ARELTKTWETIHGAPVGEUjAWVKEDENRRKGEMVLIV--EGHKAQ 238 

24 3 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 35 

5 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GC.AAAGCAC CCGAAATCGA CCCGGCTTTG - 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

JO 701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV IAAMLAGFAA XKAPEIDPAL 

15 201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

OH 101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

25 351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

30 601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 

35 i MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKXGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

40 251 KP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. m eningitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 3 laa overlap with an 
ORF (ORF76a) fiom strain A of N. meningitidis: 

45 10 20 30 
or f 1 6. pep MKQKKTAAAV IAAMLAG FAAXKA PE I D PAL 
| 1 1 1 I I 1 1 1 I 1 I 1 t 1 I 1 I 1 1 lllllllll 
or f 7 6a MKQKK TAAAV I AAMLAGFAAAKA PE I DPALVDT LVAQIMQQADRHAEQSQK PDGQAI RND 
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orf7 6 pep XELVRNQLEQGLRQEKAKLKI DALLEENGVKPX 

p P I I I I t 1 I 1 I I I 1 | 1 | | | I 1 I J I : I 1 1 t I I I I I 

orf76a DVTRDPWLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

5 The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

10 201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

15 451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

20 701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 300>: 

1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

25 101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 

30 10 20 30 40 50 60 

or f 7 6a. pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSOKPDGQAIRND 
> I M I I I I t I I I I I I M I I I I I 1 I I I I I I I ! t I I I I I i ! M I 1 I I I I I I I I I I I I I i I I i 
orf76-l MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSOKPDGQAIRND 
10 20 30 40 50 60 

35 
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or f 7 6a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 
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orf7 6-l AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 
40 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 6a . pep YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPKDEQAFDGFIMAQQLPEP 
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45 orf 7 6- 1 YEQQI RM I KLQQVS FATEEEARQAQQLLLKGLS FEGLMKRY PN DEQAFDGFI MAQQLPE P 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 7 6a . pep IJVSQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
50 * I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I 1 M I I I I I I I I I I I I I I 

orf 76-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 

250 

55 orf 7 6a. pep I DAI LEENGVKPX 

111:111111111 
orf76-l I DALLEENGVKPX 

250 

60 Homology with a predicted ORF from ^gonorrhoeae 

The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N. gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 
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MKQKKTAAAVIAAMLAGFAAXKAPEIDPAL 30 

II Illlllllll I I II MUM 

mkqkktmaviaamlagfaaakapeidpalvdtlvaqimqqadrhaeqsqrpdgqairnd 60 

elvrnqleqglrqekarlkidalleengvkp 251 

I I I I ill Mi IN I I I Ml HI III I I III ! 
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The complete length ORF76ng nucleotide sequence <SEQ ID 301> is: 
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15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGAAACAGA 
TTTTGCGGCA 
TGGTGGCGCA 
AGACCGGACG 
TTTGGAAGTT 
AGGATGTCCA 
GAGTACGTCC 
GCGTCAGTTT 
GCTTCGCAAC 
GGGCTGTCTT 
GTTCGACGGT 
agtttgCCGG 
GGCGAACGCT 
CGCGCAGCCT 
AGGAAAAAGC 
AaacCGTAA 



AAAAGACCGC 
GCCAAAGCAC 
GATCATGCAG 
GGCAGGCAAT 
TTGAAAAACA 
AAACCGCTTT 
GTTTTCTGGA 
TATGAGCGGC 
CGAAGAGGAG 
TTGAAGGGCT 
TTCATTATGG 
TATGAACCGT 
ATTACCTGTT 
TTCGAGTTGG 
CCGCTTGAAA 



TGCCGCAGTT 
CCGAAATCGA 
CAGGCAGACC 
CCGAAACGAT 
GGGCATTGAA 
AAAATCGCCG 
ACGTTCGGAA 
AAATCCGCAT 
GCGCGTCAGG 
GATGAAGCGT 
CGCAGCAGCT 
GGCGACGTTA 
CAAACTCGGC 
TCAGAAACCA 
ATCGATGCCC 



ATTGCTGCAA 
CCCGGCTTTG 
GGCATGCGGA 
GCCGTCCGCC 
GGAAGGTTTG 
AAGCGTCTTT 
ACGGTTTCCG 
GAT CAAATTG 
CGCAGCAGCT 
TATCCGAACG 
TCCCGAGCCG 
CCCGCAATCC 
GCGGTCGGGA 
GTTGGAACAA 
TTTTGGAaga 



TGTTGGCAGG 
GTGGATACGC 
GCAGTCCCAA 
GGCTGCAAAC 
GATAAGGATA 
TTATGCCGAG 
AAAGCGCACT 
CAGCAGGTCA 
CCTGCTCAAA 
ACGAGCAGGC 
CTGGCTTcgc 
GGT CAAATTG 
AAAACCCCGA 
GGTTTGAGGC 
Aaacggtgtc 



25 This encodes a protein having amino acid sequence <SEQ ID 302>: 



1 MKOKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

30 201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 
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orf 76-1 .pep 
orf7 6ng 



250 

IDALLEENGVKPX 
II I II I M Ml II 
IDALLEENGVKPX 
250 



Furthermore, ORF76ng shows significant homology to a B.subtilis export protein precursor: 
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sp|P24327,PRSA BACSU PROTEIN EXPORT PROTEIN PRSA P^CURSOR >gi J ^^^j,^^ IS15269 
33K lipoprotein - Bacillus subtilis >gi|39782 (X57271) 33kDa lipoprotein 

>5tT2226! 2 4YSnll P?i I e325l81 (Y14077) 33kDa lipoprotein (Bacillus subtilisj 
>S 1 2633331 IS !pS^ (299109) molecular chaperonin IBacillus subtilis] 

Length - 292 

Score = 50.4 bits (118), Expect « le-05 „ MOQ Mfi4 . 

Identities « 48/199 (24%), Positives - 82/199 (41%), Gaps - 32/199 (16%) 

Query: 70 VLKNRALKEGLDK DKDVQNRFKIAEASF YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + * + E 

Sbjct: 53 VLTQLVQEKVLDKKYKVSDKEIDNKLKEYKTQUSDQYTAI^KQYGKDYLKEQVKYELLTQ 112 

Query: 115 SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 



15 uu« £ y. ^ +++++E I+ + A++A+++LKG FELKY 

• Sbjct: 113 KAAKDNIKVTDADIKEYWEGIJCGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

Ouerv 164 DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

U y * D AG f Q+E+ + G+V+ DPVK Y++ K +E D 

Sbjct- -»73 DSSASKGGDLGWFAKEGQMDETFSKAAFKLKTGEVS-DPVKTQYGYHIIKKTEERGKYDD 231 



20 



25 



Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 
Sbjct: 232 MKKELKSEVLEQKLNDNAA 250 



Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
the gonococcal protein, it was predicted that the proteins from N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
30 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 OA shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
to immunise mice, whose sera were used for Western blot (Figure 10B), ELISA (positive result), 
and FACS analysis (Figure 10C). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

35 Example 36 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 303>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTTACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

40 151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

120 i CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

45 12 51 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

13 01 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

1401 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

1451 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

50 1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence <SEQ ID 304; ORF81>: 

1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 
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51 LFARNKVTRL LIAVFFAFSI IANNVHYADY QSWMT 

// 

401 ...QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 

101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAA7TATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

40^ CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

45T GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 

€01 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

701 GACGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 

801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAAGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 
1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 
1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 
1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 
1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 
1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 

1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALTFVIAALY 

51 LFARYKVTRL LIAVFFAFSI IANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNAIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORF81a) from strain A of N. meningitidis: 
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orf81 pep MKKS FLTLVLYSSLLTAS EIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 
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orf 81a 

orf81.pep 
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T.T AVFFAFSIIANNVH YAVV^c;MTTr,TNYWLMLK£IT£VGGAGA5MLDKLWLPALWGVLE 

— 70 80 90 100 HO 120 

// 

120 130 140 

QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 

I I I I I II I I I I I I I I I I I I I I I I I H I I I 
IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQP 

280 290 300 310 320 330 

150 160 170 180 190 200 

I YNQGTVQPDS YLV PLVLYS PDKAVQQAANQAFAPCE IAFHQQLST FLI HTLG YDMPVSG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 it i minim 

I YNQGTVQPDS YLVPLVLYS PDKAVQQAANQAFAPCE IAFHQQLST FLI HTLG YDMPVSG 
340 350 360 370 380 390 
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210 220 230 

orf 81 .pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

1 1 ii mmmi i mmmm ii m 

orf 8 la CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 

The complete length ORF8 la nucleotide sequence <SEQ ID 307> is: 
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30 



35 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



ATGAAAAAAT 

CAGCGAAATT 

CAAAAATGGC 

CTGTTTGCGC 

GTTCAGCATT 

TAACGGGCAT 

GGCGCAGGGG 

CGTGTTGGAA 

CGCATTTTTC 

GTGCGTTCGT 

ATACAGCCGC 

GCGTGTTGCC 

CAGCCTGCTC 

GATTATGGGC 

GGCGCGAAAC 

CCGATTGTGA 

GCCCAGTTTC 

GCGGCGGCGA 

CAAATGATTC 

CTGGCTGTTT 

TCTACAATCA 

TTGTACAGCC 

GCCTTGCGAG 

CGTTGGGCTA 

GGCAACCTGA 

GGCGGAATAT 



CCCTTTTCGT 

GCTTATCGCT 

AGAAACGTTT 

GTTATAAGGC 

ATTGCCAACA 

TAATTATTGG 

CGTCGATGTT 

GTCATGTTGT 

TGCCGATATA 

TCGACACGAA 

ATCAAAGCCA 

GTATCAGTTG 

CAAGCAGAAT 

GAAAGCGAAA 

TTCGCCGTTT 

AACAAAGTTA 

TTTAACGTCA 

TATTGTGGAT 

AAACCGTATT 

GCCTATACCT 

AGGCACGGTG 

CGGATAAGGC 

ATTGCCTTCC 

CGATATGCCG 

TTACGGGTGA 
GTTTATCCGC 



TCTCTTTCTG 

TTGTATTCGG 

GCGCTGACAT 

AACGCGTTTG 

ATGTGCATTA 

CTGATGCTGA 

GGATAAGTTG 

TTTGCAGCCT 

CTGTTTGCCT 

ACAAGAACAC 

ATTATTTCAG 

TTTGATTTAA 

CGGGCAAGGC 

GCGCGGCGCA 

TTGACCCAGC 

TTCCGCAGGC 

TACCGCATGC 

AAGTACGACA 

CGAGCAGCTG 

CCGATCATGG 

CAGCCCGACA 

CGTGCAACAG 

ATCAGCAGCT 

GTTTCAGGTT 

TGCAGGCAGC 

AATGA 



TATTCGTCCC 

AATTGAAACC 

TTGTGATTGC 

TTGATTGCGG 

CGCGGTTTAT 

AAGAGATTAC 

TGGCTGCCTG 

TGCCAAGTTC 

TCCTAATGCT 

GGTATTTCGC 

CTTCGGTTAT 

GCAAGATTCC 

AGTATTCAAA 

TTTGAAATTG 

TTTCGCAAGC 

TTTATGACGG 

CAACGGCTTG 

ACACCATCCA 

CAAAAGCAGC 

CCAGTATGTT 

GCTATCTCGT 

GCTGCCAACC 

TTCAACGTTC 

GTCGCGAAGG 

TTGAACATTC 



TACTTACTGC 

TTACCGGCTG 

TGCGCTGTAT 

TGTTTTTCGC 

CAAAGCTGGA 

CGAAGTTGGC 

CGTTGTGGGG 

CGCCGTAAGA 

GATGATTTTC 

CCAAACCGAC 

TTTGTCGGAC 

TGTGTTCAAA 

ATATCGTCCT 

TTTGGCTACG 

CGATTTTAAG 

CAGTATCCCT 

GAACAAATCA 

CAAAACCGAC 

CTGACGGCAA 

CGCCAAGATA 

GCCGCTGGTG 

AGGCTTTTGC 

CTGATTCACA 

CTCGGTAACG 

GCGACGGCAA 



50 This encodes a protein having amino acid sequence <SEQ ID 308>: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 



MKKSLFVLFL YSSLLTASEI 



LFARYKATRL 
GAGASMLDKL 
VRSFDTKQEH 
QPAPSRIGQG 
PIVKQSYSAG 
QMIQTVFEQL 
LYSPDKAVQQ 
GNLITGDAGS 



LIAVFFAFSI 



AYRFVFGIET 
IANNVHYAVY 



LPAAKMAETF ALTFVIAALY 



WLPALWGVLE VMLFCSLAKF 



QSWITGINYW 
RRKTHFSADI 



LMLKEITEVG 
LFAFLMLMIF 



GISPKPTYSR 
SIQNIVLIMG 
FMTAVSLPSF 
QKQPDGNWLF 
AANQAFAPCE 
LNIRDGKAEY 



IKANYFSFGY 
ESESAAHLKL 
FMVIPHANGL 
AYTSDHGQYV 
IAFHQQLST F 
VYPQ* 



FVGRVLPYQL 
FGYGRETSPF 
EQISGGDIVD 
RQDIYNQGTV 
LIHTLGYDMP 



FDLSKIPVFK 
LTQLSQADFK 
KYDNTIHKTD 
QPDSYLVPLV 
VSGCREGSVT 



60 ORF8 1 a and ORF8 1 - 1 show 77.9% identity in 524 aa overlap: 



65 



10 20 30 40 50 60 

orf 81a . pep MKKSLEVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 
I I 1 I : : : I 11111111111111111111111111:11111111111111111111:111 
orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 
10 20 30 40 50 60 
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PCT/IB98/01665 



10 



15 



20 



25 



30 



35 



40 



45 



50 



orf81a.pep 
orf81-l 

orf81a.pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a .pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 



70 80 90 100 HO 120 

LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 
M I I I I I I I i | I I | i I I I 1 I I I t : I M M i I | | | | : j | | | : | I I I M I I ! I II : I I I I I I 
LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 HO 120 

130 140 150 160 170 180 

VMLFCS1AKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
I t 1 1 I I 1 1 I t I I 1 I I I I I I I I I I S t I I t M I I I I I | I | | I I I I I f I 1 I I M I t ! I i I t I I 
VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

130 140 150 160 170 180 

190 200 210 220 230 240 

FVGRVLPYOLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 
I I M I I I I t I I I I I :! I : I I I I M I : M I I I: I I 1 I I III III I I I I I I I I I I I I I I I I I 
FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

250 260 270 280 

LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFKVIPHANGLEQISGGD 

||:|||IIM IMIIHM Mill IN I 1111:1 11! II II Mill I 
LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 

290 300 310 320 

IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 
IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
370 380 390 400 410 420 

330 340 350 360 370 380 

AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIArHQQLSTF 

I || | | | | | I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I! I I I I I 1 I I I I M I I II I 
AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
430 440 450 460 470 480 

390 400 410 420 

LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
M I I I 1 1 11 I I I I I I I I I I 1 1 I I I I I M I I I I 1 1 I M I I 1 1 I I I I 
LIHTLGYDMPVSGCREGSVTGNLITGDAGSLKIRDGKAEYVYPQX 
490 500 510 520 



Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF81 and a predicted ORF (ORF81.ng) from K gonorrhoeae of the 
N- and C-termini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 
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orf81.pep 
orf81ng 
orf 81 .pep 
orf 81ng 
orf 81. pep 
orf 81ng 
orf81 .pep 
orf 81ng 



MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 
t | I I : : : I I I I I I I I I I I I II : : I I I I I I I I I : I I I I I II I : I I I I I I I I I I : : I I 
MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFKIAALYLFARYKASRL 

LI AVFFAFS I IAKNVHYADYQSWMT 
I I I i I I I I I : M I M I I I MUM 

LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 

// 

QTVFEQLQKT PDGNWLFAYT S DHGQYVRQD 
I I I M M M M I I II I II I II I I I M I M 
ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 



60 
60 



85 



120 



433 



433 



493 



IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
I || | 1 1 | 1 1 II I : M II II I I I 1 I I M M 1 1 I I M I M M M 1 1 M M M M I I J M II I 
IYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 493 



BNSOOCID. <WO 99?457BA2_L> 



PCT/IB98/01665 

WO 99/24578 

-208- 

orffll nm creGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

P P ml I IIIMI llll :|||IIIHI 

orf81ng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 

The complete length ORF81ng nucleotide sequence <SEQ ID 309> is: 



5 i 

51 
101 
151 
201 

10 251 
301 
351 
401 
451 
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ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 
rAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 
CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 
CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 
GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 
TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

Igcgcgggcg CGTCGATGTT GGATAAGTTG tggctgcctg ctttgtgggg 
CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 
CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

1 01 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 

15 HI GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 

Si CAGCCTGCTC CAAGCAAAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

ill GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 

701 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

OA 751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

20 801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG GAACAAATCA 

III GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

? c ioo! ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

25 1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 

llll ESScGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

Wll GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

llll SSSSStC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 

llll Stacaatca AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 

1351 SSScC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 
llll GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

i!m cgttgggcta cgatatgccg gtttcaggtt gtcgcgaagg CTCGGTAACA 

35 1501 SSSS TTACGGGCGA TGCAGGCAGC TTGAACATTC GCAACGGCAA 

1551 GGCGGAATAT gtttatccgc aataa 

This encodes a protein having amino acid sequence <SEQ ID 310>: 

1 MKKSLFVLFL VSSLT.TASEI AVRFVFGIET T.PAAKMAETF ALTFMIAALY 
51 ^v^- I^FAFSM IANNVH YAVY QSWMTGINYW UlLKItVim 

40 ill VRSFDTKQEH GfSPKPTYSR IKANYFSF GY FVGRVLPYQL FDLSKIPVFK 

201 5PAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

III piSysag fmtavslpsf fnviphahgl eqisggdtnm frlakeqgye 

H\ TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

ac HI KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

45 Joi SlQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

\ll SSqq aaJSqafapce IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 

ORF81ng and ORF81-1 show 96.4% identity in 524 aa overlap: 

10 20 30 40 50 60 

ol 10 20 30 40 50 BU 

55 1Q 80 90 100 110 120 

orfSl-l uwraKIlSWH™VYOS^GINYWLraj<OTEVGS»GAS»LDK^ 
orioi gQ 90 100 110 120 

130 140 150 160 170 180 

VMLrcSUtfFRRKTHFSADI^ 

lYlllllllMIIMIMIIIIIlMMMninilliniMUMIMIIHIIUI 
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190 200 210 220 230 240 

orfSlnq-l.pep FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHXJCLFGYGRETSPF 

orffll-l FVGRVLPYQLFDLSRIPAFXQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 
190 200 210 220 230 240 



orf81ng-l .pep 
orf81-l 



orf81ng-l.pep 
orf81-l 



orf81ng-l.pep 
orf81-l 



orfBlng-l.pep 
orf81-l 



orf81ng-l.pep 
orf81-l 



250 260 270 280 290 .300 

LTRLSQADFKPIVKQS YSAGFMTAVSLPS FFNVI PHANGLEQI SGGDTNMFRLAKEQGYE 
| | | | | I | I | I I I I I I I I I I I I I 1 I I I I I ! M I : I II I I I I 11 I I I I I II M I I i I I I I I I 
LTRLSQADFKPIVKQS YSAGFMTAVSLPSFFNAI PHANGLEQI SGGDTNMFRLAKEQGYE 

250 260 270 280 29C 300 

310 320 330 340 350 360 

TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 

I I 111111)1:11 I II II Mill II Ml IMIM II I Mltimi II III illtll: It 
TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 

310 320 330 340 350 360 

370 380 390 400 410 420 

IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

| | | | 1 | | 1 I I I 1 I I 1 1 I I I 1 I 1 I I I I I I 1 t 1 I 1 I 1 I I I I 1 I I I I I 1 I I I I I 1 I 1 I 1 1 I f I 
IVLHQRGSHAPYGALLQPQDKVFGEADTVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 

430 440 450 460 470 480 

AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

|||||||illlllllllMIIIIII:IIMIIIIIIIIIIIIIIillillililHMll 
AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
430 440 450 460 470 480 

490 500 510 520 

LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 
| MINI MM III I III III Mil lllll U I I: Mill Mill 
LI HT LG Y DMP V S GCREG S VTGN L I TG DAG S LN I R DG KAE YV Y PQX 

490 500 510 520 



Furthermore, ORF81ng shows significant homology to an E.coli OMP: 

gi 1 1256380 {U50906) outer membrane adherence protein-associated protein [E. 
coli) Length » 547 

Score - 87.4 bits (213), Expect - 2e-16 (1 , ai 
Identities = 122/468 (26%), Positives « 198/468 (42%), Gaps « 70/468 (14%) 

Query 25 VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS — RLLIAVFFAFSMIANNVHYAVYQ 81 

VFGI LA+A LF+++R + RLL+A F + A ++Y 

Sbjct: 29 VFGITNLVASSGAHMVQRLLFFVLTILWKRISSLPLRLLVAAPFVL-LTAADMSISLY- 86 

Query 82 SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 
Sbjct: 87 S W CT FGTT FN DG FAI S VLQS D PDEV AKMLG -MYS PYLCAFAFLSLLFLAVI IKYDV 141 

Query: 135 HFSADILFAFLMLMIFVRSF DTKQEHGISPKPTYSRIKAN— YFSFGYFVG 183 

+ L+L++ S D K ++ SP SR +F+ YF 

Sbjct: 142 SLPTKKVTGILLLIVISGSLFSACQFAYKDAKNKNAFS PYILASRFATYTPFFNLNYFAL 201 

Querv* 1B4 RVLPYQ— LFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPFL 241 

+Q L + +? F+ + X VLI+GES ++ L+GY R T+P + 

Sbjct: 202 AAKEHQRLLSIANTVPYFQL SVRDTGIDTYVLIVGESVRVDNKSLYGYTRSTT PQV 257 

Que-y 242 TRLSQADFKPIVKQSYSAGFMTAVSLP S FFNVI PHANGLEQI SGGDTNMFRLAXEQG 298 

+Q + Q+ S TA+S+P + +V-r H I N+ +A + G 

Sbjct: 258 E — AQRKQIKLFNQAISGAPYTALSVPLSLTADSVLSH DIHNYPDNIINMANQAG 310 

Que-y 299 YETYFYSAQA ENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQ 355 

++T++ S+Q+ +N A+ ++ ++ + Y G DE LLP + Q 
Sbjct: 311 FQT FWLS SQSAFRQNGTAVT S I AMRAMSTVYVRGF DELLLPHLSQALQQ 359 

Ouery 356 --QGRHFIVLHQRGSHAPYGALLQPQDKVFGEADIVDK-YDNTIHKTDQMIQTVFEQLQK 412 

Q + IVLH GSH P + VF D D YDN+IH TD ++ VFE L+ 

Sbjct: 360 NTQQKKLIVU1LNGSHEPACSAYPQSSAVFQPQDDQDACYDNSIHYTDSLLGQVFELLK- 418 
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Ouerv 413 QPDGNWLFAYTSDHG QYVRQDIYNQG— TVQPDSYIVPL-VLYSP 454 

y " D Y +DHG + +++Y G +Y VP+ + YSP 

Sbjct: 419 — DRRASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 4 64 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 31 1>: 

1 ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC.GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

401 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

4 51 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 

1 TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 * LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE WPPXYADTD VBVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 

1 MKTLLLLIPL VLTA CGTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
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301 DVGNEVIRrr KGG* 

Computer analysis of this amino acid sequence gave the following results: 
Homolo gy with a predicted QRF from N. m eningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A oiN. 
5 meningitidis: 

10 20 30 40 50 

or f 8 3 . pep TLLLFIPLVLTX CGTLTGIIAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 

III NiMH I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I 

O r f 8 3 a MKTLLXLIPLVLTA CGTLTGI PAHGGGKRFAVEQELVAAS SRAAVKEMDLS ALKGRKAAL 

10 10 20 30 40 50 60 

60 70 B0 90 100 110 

orf83.pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I | I I I I I I | I | I I I I I I I I I I I II I I I I I I I I I II 1 I I I I I ! I II I 1 I I I I I I I I I I I I 1 
15 orf83a YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 83 . pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

20 1 1 1 1 M 1 1 1 1 1 1 1 u M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i n 1 1 ll 1 1 M 1 1 1 i I 

orf 83a TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 1*?0 180 

180 190 
25 orf 8 3. pep IEWPPXYADTDVFVTVDV 

I I I I I I I I I I I I I I I I I I 

orf 83a I E W P PE Y A DT D VFVTVD V FG T VRSRT E LHL YN AE T LKAQT K LE Y FAV DR D S RKL L I APK 

190 200 210 22C 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 

30 1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

35 251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

40 501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

45 751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 316>: 

50 1 MKTLLXLIPL VLTA CGTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 S ALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

55 251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 

10 20 30 40 50 60 

orf 83a. pep MKTLLXLI PLVLTACGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
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Ml M I I M M I I I II I I I I I I I I 1 M i I M I II I II I I I I I I I I II M M II I I II I ! I 
T S LLN APAAAL^ LLAN PRDVS FLTN L I QTV FYLRG 

150 160 l'O lb" 
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120. 
180 
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230 
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190 200 210 220 

IEWPPEYADTDVrVTVDVFGTVRSRTEUU-YNAETUCAQTKLEYFAVDRDSRKLLIAPK 

MM I I I I I I I I I Ml I I I II I I M I I I I Ml I I II I I 1 M II M I 1 1 I I I I I I I M: I I 
IEWPPEYADTDVEVIVDV 

200 210 220 230 240 
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300 



280 



300 



250 260 270 280 290 

TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 

I Ml I III I Ml II I I M : I : I M M II I M I I M M I M I M I II M M I M I M M I 
TAAYESQYQEQYALWTGPYKVSKT^ 

250 260 270 280 290 ™n 

310 

DVGNEVIRRRKGGX 
I M M I II I M I M 
DVGNEV I RRRKGGX 
310 



35 Homology with a predicted OKF from N. gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from N. 
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gonorrhoeae: 

orf83.pep 
orf83ng 
orf83.pep 
orf83ng 
orf83.pep 
orf83ng 
orf83.pep 
orf83ng 



TLLLF T OLVLTXCGTLTGI LAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
. , . . . ll 1 1 I I mill! I M M M M M I M M M M II II M M M I M II M 
MKTLLLLIPLVLTACGTLTGIPAH^ 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I IT I I I I II I II II M I I II I I II M II I II I : I M : M II II I M M II I I I I I : I M I 
WS^GWGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNET^ 

til 1 1 lilt It 1 1 : M I M 1 1 M 1 M I I M I M M I M 1 1 1 II 11 M I I I 1 1 1 1 M M I 
TSLIJfl^AAALT^ 

IEWPPXYADTDVFVTVDV 
Mill! IMMIIIMM 



IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

The complete length ORF83ng nucleotide sequence <SEQ ID 317> is: 
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120 



178 



180 



197 



240 



55 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 

601 



ATGAAAACCC 

ACTGACCGGC 

AGGAACTCGT 

TCCGCCCTGA 

CCAAGGTTCG 

TACGCGGCGG 

CCCGCCTATG 

AACCACTTCC 

ACAACGGACG 

GGCGACTACC 

CCTGACCAAC 

TACCGCCCGA 

GGCACCGTCC 



TGCTCCTCCT 
ATACCCGCCC 
CGCCGCATCG 
AAGGACGCAA 
GGCAACATAA 
CTACCACAAC 
ACACTACCGC 
ACATCGCTTT 
CAAAGGCGAA 
GCAACGAAAC 
CTCATCCAAA 
ATACGCCGAC 
GCAGCCGTAC 



CATCCCCCTC 
ACGGCGGCGG 
TCCCGCGCCG 
AGCCGCCCTT 
GCGGCGGACG 
AACCCCGACA 
CACCACCAAA 
TGAACGCCCC 
CGCTCCGCCG 
CCTGCTCGCC 
CCGTCTTCTA 
ACCGACGTAT 
CGAACTGCAC 



GTACTCACCG 
CAAACGCTTT 
CCGTCAAAGA 
TACGTCTCCG 
CTACTCCATC 
GCGCCACCCG 
TCCGACGCGC 
CGCCGCCGCC 
GACTGTCCGT 
AACCCCCGCG 
CCTGCGCGGC 
TCGTAACCGT 
CTCTACAACG 



CCTGCGGCAC 
GCCGTCGAAC 
AATGGACTTG 
TTATGGGCGA 
GACGCACTGA 
ATACAGCTAC 
TCTCCGGCGT 
CTGACGAAAA 
CAACGGCACG 
ACGTTTCCTT 
ATCGAAGTCG 
CGACGTATTC 
CCGAAACCCT 
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651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

5 851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 31 8>: 

1 MKTL LLLIPL VLTAC GTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

10 . 101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRKETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 OYALWM GPYS VGKT VKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

301 DVGNEVIRRR KGG* 

15 ORF83ng and ORF83-1 show 97.1% identity in 313 aa overlap 



20 



10 20 30 40 50 60 

or f 83- 1 . pep MKT LL LL I PLVLTACGTLTG I PAHGGGKRFAVEQELVAAS SRAAVKEMDL SALKGRKAAL 
I I I I I I M I I II 1 1 I i i I I I 1 1 I M I t 1 1 I 1 I I I I I I I I i | t I I I I t i I I I I I I I I [ II I 
orf83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 
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orf 83-1. pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I I I I I i I I I I I 1 I I II I I I I I I II I I I I I I I I: II I : I I I I I I I I II I I I II I I I : I II I 
orfB3ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

70 80 90 100 110 120 



30 



35 



40 



45 



130 140 150 160 170 180 

orf 83-1. pep TSLLKAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTN LIQTVFYLRG 

llllll IMIIIIItlMIIIIIIIIIIMIII llllllll llillll III II II Mill 
orf83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVS FLTN LIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 83-1. pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
I I I I I I I I I I I II I I I II II I I II I I II I I I I I II II I I 1 I I I I I I I II I I I I I I I I : I I 
orf83ng IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 83- 1 . pep TAAYESQYQEQYALWTGPYPCVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I II II I I I II I I I I I I I I: I: I I I I I I I I I I I I I I I i I I II II I I I I I I II 1 I I I I I: I 
orf83ng TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 

250 260 270 280 290 300 



310 

orf 83-1 .pep DVGNE V I RRRKGGX 
I I I I I I I I I I I I I I 
50 orf83ng DVGNEVI RRRKGGX 

310 



Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N. meningitidis and 
55 K gonorrhoeae* and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in ^meningitidis <SEQ ID 
319>: 

ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 
Satggtt TCCATGATGG CGAATGATGA AATGTTTAAG cctgatgaaa 
tarrCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 
rAC-ACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

gcagctttS gc*g£atga1a tgtacgaatg gataaagaag cccgaaaata 

- . TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 
10 III TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

10 III ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

ATCAAAATCT tIgAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

JSKtgggS TGCGTACGCT tttagaatgg aaaatatgcg cggacgatcc 
^Saatg gcatcaagcg cattctccag tatctataca ctggataaaa 

A^GT^GA CTTGTAysrr TnnnGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

til CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

701 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

ill C^TCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

,n 801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

20 H\ SS AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG ^GGCATt 

111 aaAAGAAGTG ACGGaGTTGA TGTGccaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

„ c CGCGCAa CAA CATTCGGACA GGGCGcCAAG TTGCCACATT GGGCGGAAAA 

25 All CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 

MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 
SI HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 
101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 
ill St[lew KICADDPV^ ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 
201 ^SKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

Hi Sdktegepv nngnltadmf vptlsekpxs kpiyngvrqv rtfeyiagci 

„ "1 LPDKTEGEPV v RDyvK NGLPFNPYKE ESQGQEVQQS 

35 351 SS at£ggkpxqn lmydnweerg kpfegigggv vgsan- 

Further work revealed the complete nucleotide sequence <SEQ ID 321>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 
51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 
40 101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

ill CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 
201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 
III TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 
301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 
A c H\ A CATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

45 HI ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

ill AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
III CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 
111 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 
S0 III AtGCKTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 

50 HI CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 
ill CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 
III SaTGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 
« 111 SS AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

55 111 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 

III GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 
llll CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 
llll gcgSgcaac ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 
60 no! GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

u51 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 
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301 
351 



MAEICLITGT 
HTYIETDAKK 
SAGSKIPENV 
KMGMRTLLEW 
KRSKWFYTLP 



PGSGKTLKKV 
LPKSTDEQLS 
QWLNTHRHQG 
KICADDPVKM 
VIVLLIPVFV 



LPDKTEGEPV 
EGGRTGCACY 
AQQHSDRAQV 



NNGNLTADMF 
SHQGTALKEV 
ATLGGKP'QN 



SMMANDEMFK 
AHDMYEWIKK 
IDIFVLTQGP 

assafssiyt 
glsykmlssy 

VPTLSEKPES 
TELMCKDYVK 
LMYDNWEERG 



PDENGIRRKV 
PENIGSIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VRKHYH1ASN 
SAEVHTVNKV 
ESAATEQQAV 
RTFEYIAGCI 
ESQGQEVQQS 
VGSAN* 



Computer analysis of this amino acid sequence gave the following results: 
10 Homology with a predicted ORF from N -meninzitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of N. 
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meningitidis: 



orf 84 .pep 
orf 84a 

orf 84. pep 
orf 84a 



erf 84. pep 
orf 84a 

orf 84. pep 
orfB4a 

orf 84 .pep 
orf84a 

orf 84. pep 
orf84a 

orf 84 .pep 
orf 84a 



10 20 30 40 50 60 

MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 

, I I II I | | I I I I M I I I II 1 1 H I I Ml I II I I I I M I I I I II I I I I I I I I I H I I M 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
| 1 I | I I | | I I I I I I I 1 I I I 1 I I I I I I I I t I I I t I I I I 1 t I I I I I 1 I I 1 I t I I 1 I 1 I I I I t 
LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



130 140 150 160 170 180 

I D I FVLTQG PKLL DQN LRT LVRKH YH IASNKMGMRT LLEWK I C ADD P VKMAS S AFS S I YT 

Him mi 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ■« i * 1 1 1 1 1 ii 

I DIFVLTQGSKLtDQNLRTLVRKHYH IASNKMGMRT LLEWK I CADDPVKMASSAFSS I YT 
130 140 150 160 170 180 

190 200 210 220 230 240 

Tnv^rvnT.vvYftrVHTVNKVKRSKW FYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 

1IHIMM ||lllllllllllll!imi:MIIIIIIIMHIIimillllHi 
I pvinTvnT.VFgaP.VHTVNKVKRSKW FYTLPVI ILLIPVFVGL SYKMLSSYGKKQEEPAAQ 

200 210 220 230 240 



190 



290 



300 



250 260 270 280 

ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 

l,llll:|M: I I f I I t I I I t 1 I 1 I I 1 1 I I 1 I I t 1 1 1 I II IN mi I mil I I II: 
ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCACYSHQGTALKEVTELKCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

I | || | | | : | I | | HI 111 I : I : I Ml I :: I I III Ml I M III :: III I Ml I1 1 II 
EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

310 320 330 340 350 360 

370 380 390 

ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
Mill I I IIMIMIHMIMMMMIilllM 
ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

370 380 390 



The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 



55 



60 



1 ATGGCAGAGA 

51 AAAAATGGTT 

101 ACGGCATACG 

151 CACACCTACA 

201 GCAGCTTTCG 

251 TCGGGTCTAT 

301 TCGGCAGGTT 

351 ACATCAGGGC 

401 ATCAAAATCT 

451 AAGATGGGTA 



TCTGTTTGAT 
TCCATGATGG 
CCGTAAAGTA 
TAGAAACGGA 
GCGCATGATA 
TGTCATTGTA 
CAAAAATCCC 
ATTGATATAT 
TAGAACGCTT 
TGCGTACGCT 



AACCGGCACG 
CAAACGATGA 
TTTACGAACA 
CGCGAAAAAG 
TGTACGAATG 
GATGAAGCTC 
TGAAAATGTC 
TTGTTTTGAC 
GTACGGAAAC 
TTTAGAATGG 



CCCGGTTCAG 
AATGTTTAAG 
TCAAAGGCTT 
CTGCCGAAAT 
GATAAAGAAG 
AAGACGTATG 
CAATGGCTGA 
TCAAGGCTCT 
ATTACCACAT 
AAAATATGCG 



GGAAAACATT 
CCGGATGAAA 
GAAGATACCG 
CGACAGATGA 
CCCGAAAATA 
GCCGGCACGC 
ATACGCACAG 
AAGCTTCTAG 
CGCTTCAAAC 
CGGACGATCC 
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10 



oux 


CGTAAAAATG 


DDI 


AAGTTTATGA 




AAGCGGTCAA 




CGTTTTTGTu 


701 


AGGAAGAACC 


751 


TTTCAGGATA 


801 


AGATATGTTT 


851 


ATAACGGTGT 


901 


GAAGGCGGAA 


951 


GAAAGAAATT 


1001 


CGTTTAACCC 


1051 


GAGCAGCACC 


1101 


GTGGCAAAAT 


1151 


AAGGAATCGG 



GCATCAAGCG 
CTTGTACGAA 
AATGGTTTTA 
GGCCTGTCCT 
CGCAGCACAA 
AAACAGAAGG 
GTTCCGACAT 
AAGGCAGGTA 
GAACCGGATG 
ACAAAGGAAA 
ATATAAAGAA 
ATTCGGACAG 
CTTATGTATG 
CGGGGGCGTG 



•216- 

CATTCTCCAG 
TCAGCGGAAG 
TACTCTGCCA 
ATAAAATGTT 
GAATCGGCGG 
CGAGCCGGTA 
TGTCCGAAAA 
AGAACCTTTG 
CACATGCTAT 
TGTGCAAGGA 
GAAAGCCAAG 
ACCGCAAGTT 
ATAATTGGCA 
GTCGGATCGG 



TATCTATACA 
TTCATACCGT 
GTAATAATAT 
AAGTAGTTAT 
CAACAGAACA 
AACAACGGTA 
ACCCGAAAGC 
AATATATAGC 
TCGCATCAAG 
TTACGCAAGA 
GGCGGGATGT 
GCCACGTTGG 
GGAGCGCGGA 
CAAACTGA 



CTGGATAAAA 
AAATAAGGTC 
TGCTGATTCC 
GGAAAAAAAC 
TCAGGCAGTA 
ACCTTACCGC 
AAGCCGATTT 
AGGCTGTGTA 
GGACGGCATT 
AACGGATTGC 
CCAGCAAAGT 
GCGGAAAGCC 
AAACCGTTTG 



1 5 This encodes a protein having amino acid sequence <SEQ ID 324>: 



20 



MAEICLITGT 
HTYIETDAKK 
SAGSKIPENV 
151 KMGMRTLLEW 
201 KRSKW FYTLP 



1 
51 
101 



PGSGKTLKMV 
LPKSTDEQLS 
QWLNTHRHQG 
KICADDPVKM 
VIILLIPVfV 



251 FQDKTEGEPV 
301 EGGRTGCTCY 
351 EQHHSDRPQV 



NNGNLTADMF 
SHQGTALKEI 
ATLGGKPWQN 



SMMANDEMFK 
AHDMYEWIKK 
TDIFVLTQGS 
ASSAFSSIYT 
GLSYKMLSSY 
VPTLSEKPES 
TKEMCKDYAR 
LMYDNWQERG 



PDENGIRRKV 
PENIGSIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VRKHYHIASN 
SAEVHTVNKV 
ESAATEHQAV 
RTFEYIAGCV 
ESQGRDVQQS 
VGSAN* 



ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orf84a.pep 
orf84-l 

orf84a.pep 
orf84-l 

orf84a.pep 
orf84-l 

orf84a.pep 
orf84-l 

orf84a .pep 
orf84-l 

or£84a.pep 
orf84-l 

or£84a.pep 
orf84-l 



10 20 30 40 50 60 

MAEICLITGTPGSGKTIJCWSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

IIIUtllllllHIIIfllllMlilltllllllliliMlllllllllllMMMII 
MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
| I | | | | I | | I M I I M I I I I I I I I t I I I t I I I i I t I I I t t I I I i I I ! I I I I 1 i t I I I I I I 
LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
lllllilll i 1 | 1 1 I I 1 ! I I I 1 i t I I I 1 I t I I !! I 1 1 1 1 I M I I I I I I I 1 I I I I I I 1 I t 
I DI FVLTQG PKLLDQNLRT LVRKHYH I ASNKMGMRTLLE WKI CADDPVKMAS S AFS S I YT 

130 140 150 160 170 180 

190 200 210 220 230 240 

LOKKVYDLYESAE\^TVNKVKRSKWFYTLPVIILLIPVFVGLSYKMLSSYGKKQEEPAAQ 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 I 1 1 ! I 1 1 1 1 I - 1 1 1 1 1 I ! 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 Mill 
LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEHQAVFQDiCTEGEPVNKGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

||||||:|||: mimilMIIMillimillMIIIMimillllMMII: 
ESAATEQOAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCTCY SHQGTALKEITKEMCKDYARNGLP FN PYKEESQGRDVQQSEQHHSDRPQV 

1 t | I I I I : I I I I I M I I I I : I : 1 I M I : : I I I 1 1 t I I I I I I I I : : I I I I 1:1111 M 
EGGRTGCACYSHQGTALKEVTEIUCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 

370 380 390 

ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

| I I I 1 1 I I I I I I It I : I I I It I I I It I I I I I H I I 
ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

370 380 390 
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Homology with a predicted QRF from N.^onorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 
gonorrhoeae: 

orf 84. pep MAEICLITGTPGSGKTLKMVSMMANDEMETCPDEKAIRRKVFTN^ 60 

M^ICLITCTPGSGKTL^ 60 
LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDE^^ I 20 

10 crf84ng LPKSTDEQLS^DMYEWIKKPENVGAIVIV 120 



orf84ng 
orf 84 .pep 



180 
180 

15 orf84 pep LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEP^Q 240 

I 1 1 | | | | || M : | | I I I M I I I I I I : I I M : II i I : M I I I M I I : I I I I M I I I I I I 
LDKKWDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQ 240 



orf 8 4 . pep IDIIVLTQGPKUJCtf^^ 

I | 1 1 1 1 1 1 1 1 1 1 I i I I I II M I I I I 1 : 1 1 1 1 : I M I I I I : I I I M I i I I I M I I I I II 

orf84ng I D I ^LT^ PKLLDQNLRT LVKRH YH I AAN KMG LRT LLEWKVC AD D P VKMAS S AF S S I YT 



orf84ng 



300 



lf\ rt -fR4 npn ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 

20 o.f84. pep ^ nil ii 1 11 1 1 1 1 1 1 1 miniiiiiim in niiiiin mniiiini 

orf 84ng es^TEQQAVLPDKTEGESWNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFE^ 300 

orf84 oeo EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

orf84 .pep Lb | | I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I » II 

orf84ng egGRTGCTCYSHQGTALKEVTE^ 360 

orf84 oep A TLGGKPXQNLMYDNWEERGKPFEGIGGGWGSAN 395 
* P F || Kill II I I I II 1 II M I I I I II I II I I I I M 
30 orf84ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 395 

The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 

1 ATGGCAGAAA tctgtttgat aaccggcacg cccggttcag ggaaaacatt 
*1 - aaaaatggtt tccatgatgg caaacgatga aatgtttaag ccagatgaaa 
101 acggcgtacg ccgtaaagta tttacgaaca tcaaaggttt gaagataccg 

^ 15 1 CACACCCACA TAGAAACAGA CGCAAAGAAG CTGCCGAAAT CAACCGATGA 

^ 201 ACAGCTTTCG GCGCATGATA TGTATGAATG GATCAAGAAG CCTGAAAacg 

251 tcqgcqCAAT CGTTATTGTC GATGAGGCGC AAGACGTATG GCCCGCACGC 

301 TccgCAGGTT CGAAAATCCC CGAAAACGTC CAATGGCTGA ACACACACAG 

351 GCATCAGGGC ATAGATATAT TTGTATTGAC ACAAGGTCCT AAACTCTTAG 

4 A 401 ATCAGAACTT GCGAACATTG GTTAAAAGAC ATTACCACAT TGCGGCCAAC 

W 451 AAAATGGGTT TGCGTACCCT GCTTGAATGG AAAGTATGCG CGGATGACCC 

501 GGTAAAAATG GCATCAAGTG CATTTTCCAG TATCTACACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCCGCAGAAA TTCACACGGT AAACAAAGTC 

601 AAGCGTTCAA AATGGTTTTA TGCATTGCCC GTCATCATAT TATTGATTCC 

ac 651 GCTATTTGTC GGTTTGTCTT ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG AGAATCGGTG AATAACGGAA ACCTTACGGC 

801 AGATATGTTT GTTCCGACAT TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGGACCTTTG AATATATAGC AGGCTGTATA 

SO 901 GAAGGCGGAA GAACCGGATG CACCTGCTAT TCGCATCAAG GGACGGCATT 

J 951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACCTTGG GCGGAAAACC 

1101 GCAGCAGAAC CTAATGTACG ACAATTGGGA AGAACGCGGG AAACCGTTTG 

55 US! AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 326>: 

1 MAF.TCLITGT PGSGKTLKMV SMMANDEMFK PDENGVRRKV ETNIKGLKIP 

51 HTHIETDAKK LPKSTDEQLS AHDMYEWIKK PENVGAIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 

60 151 KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE SAEIHTVNKV 

201 KRSKW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESOGQEVQQS 

351 AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 
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ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 
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orf84-l.pep 
orf84ng 



orf 84-1. pep 



orf 84ng 



orf 84-1. pep 
orf 84ng 



orf 84-1. pep 
orf 84ng 



orf 84-1 .pep 
orf84ng 



orf 84-1 .pep 
orf 84ng 

orf84-l.pep 
orf84ng 



10 20 30 40 50 60 

MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPKTYIETDAKK 

MIIUIIIM I 1 1 1 I I 1 I 1 I I 1 1 1 1 1 r 1 1 1 | 1 | | I I I 1 I I I 1 I - 1 I I 1 I 1 I 

MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
M I 1 1 I M I I I I I t I M M ! I I > ' I : I M M f I I M M I I 1 I I I ! M I M t I I I I I 11 M 
LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 HO 120 

130 140 150 160 170 180 

IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
It! II lllllt! lt!M!MI::t II ll:IMI:IUIIII:MM MM I INI Hill 
IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVKKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
M I I II I I II I I I : I I I M M I I M M : I I I I : I 1 1 1 : I I M I I I I I : I II I II I I I I I I 
LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAO 
190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I II I II II II It M 1 1 I 1 1 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQOHSDRAQV 

I I | | | | 1: I I II I I I I II I I I I I I I I I I I I I I I I II II I I I I i I I I I I I I I I I I I I I M I 
EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
310 320 330 340 350 360 

370 380 390 

ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

M I I I t I I I I I I I I II i I I I I I I I I II I I I I I I I I 
AT LGGK PQQNLM YDNWEERGK P FEG I GGG VVGSANX 
370 380 390 



Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
45 double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 327>: 

1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 

50 51 TGAAGTCAAA CTGAAAAAAT TCCATATCGA TTTTTACAAT ACGGGTATGC 

101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 

151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTTCGGAT TTGACATTCA 

251 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 

55 301 ACATCCATAC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

351 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

401 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

4 51 ACTCAGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA ATATAAAAAC TATATGCTGC 

60 551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC. 
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60 1 TTGCAGCAGC AATACCGCTG GCTGCGTATC CCCTTGGACA AGCAGTTGAA 

651 AGCGGACACC TTTATGGCAT TGCGTGAGTT TTTGAAAGAT GGGGAAGGGC 

701 GCAAACGTCT . GTTGCCGAC GCAACCAAAG GCGCACCTGC CGAAATCCGC 

751 GAACAATTCA TGCTGGCTGC GGAAAACACG CTGAACATCT TTGCACAAAA 

5 801 AGGCTATTTG GGATTGGACG AATTTATTAC GTCCAATATC CCGAAAGAGC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG AAATGCTTTA CGGCGTGATG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG TACGGCTTGC CCGAATGGCA 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT GCACAGTATG GATGCGTACA 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC TGCTGCAACT TGATGGGTTT 

10 1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG ACCCGTTCCC C.GGTCCGCT 

1101 TTTGGTCTAT CTC. . . 

This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 

1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASREPWLKA 

15 101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG YFYEMLYGVM 

301 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

20 351 SEVRSSGLQM TRSXGPLLVY L. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 329>: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

25 151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

30 401 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

35 651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

40 901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

45 1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

50 1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

55 1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

60 1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 

1 MSKSRRSPPL LSRPW FAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 
65 51 YXVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFV? 
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101 REMKSFREKV KEKSIAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

Hi DGSVLIAAKK GTWKWGYIFA^^ KW3MLTGRIV 

201 PDNQAVYAKD FKPESII^TnLS^GNVNI SEGQSADWF U^ILVQ 

HI DLPFEVKLKK FHIDFYHTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ Q™ W ^?^ 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 



Computer analysis of this amino acid sequence gave the following results: 
Hnmnlnf w with a predicted OKF fr nm N.meninritidis (strain A) 
15 ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A of 
meningitidis: 



orf88.pep 



10 20 30 

MVFLNADNGILVQDLPFEVKLKKFHIDFYN 
I I I I M I I I I I Ml M I II I I I I M I I M 



25 



orf88.pep 
orf88a 



30 orf88.pep 



9 a or f88a AKDFKPES I LGASNLSFRGNVN I SEGQS ADWFLNADNGI LVQDLPFEVKLKKFHI DFYN 

ZU 210 220 230 240 250 260 

40 50 60 70 80 90 

TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 
■ I I I i I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 I I I I t I M 1 1 1 I I 1 1 1 1 1 1 1 t 1 f I I I I I ! 1 1 1 1 I I I 
TGMPRDFAS D I EVTDKATGEKLERT IRVNHPLT LHG IT I YQAS FADGG SDLT FKAWN LGD 
270 280 290 300 310 320 

100 110 120 130 140 150 

ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 

I | | | | I | I I I I I I I I t I 1 I I I I I M I I I I M M I I 1 I M I II I I I I M I I I I I I I I I II 
nrfBSa ASREPVVLKATSIHQFPLEIGKHKYRlXFDQFTSMNVEDMSEGAEREKSLKSTloNDVRAV 
330 340 350 360 370 380 

160 170 180 190 200 210 

nrfBB pep TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 
orf88.pe P 1 1 | | : ||| I mill I I I I 1 1 I M I 1 1 1 1 1 I 1 1 1 1 1 M I I I I 

orfBSa TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYiVITGTRSGLQQQYRWLRI 
390 400 410 420 430 440 

40 220 230 240 250 260 270 

orf88 Pep PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLN I FAQKGYL 

orf88.pep PLDKU ,,,,,,,,,,,,,,,,,,,, M | | | II I 1 1 1 1 I I I I t I II I I I t i I I I I 1 I I I 

Q rf88a PLDKQLKADTFMALREFLKDGEGRKRLVADATKGAPAEIREQFMLAAENTLN I FAQKGYL 

45 450 460 470 480 490 500 

280 290 300 310 320 330 

orf88 pep GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 
orfBB.pep uw*. 1 1 || 1 1 1 1 1 1 1 1 1 tl 1 1 i 1 1 I I M ! 1 1 1 1 1 I I I I I I I I I I M M I I I I I I I I I 
GLDEFIT SN I PKEQQDKMQG Y FYEMLYGVMNAALDETI RRYG LPEWQQDEARNRFLLHSM 
510 520 530 540 550 560 



35 



50 orf8Ba 



340 350 360 370 

orf88 pep DAYTGLTEYPAPMLLQLDGFSEVRS SGLQMTRSXGPLLVYL 

55 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I III! I TOmWD 

DD orf88a DA^GLTEYPAPMliQLIX3FSEVRSSGLQMTRSPGALLV^ 

570 580 590 600 610 620 

orf88a AWVLFSDGK1RFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
60 630 640 650 660 670 

The complete length ORF88a nucleotide sequence <SEQ ID 33 1> is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 
51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 
101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 
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151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



TATTTGGTCA 
ACTGTATGAC 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AGAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGC 
TCCGCGCCGT 
ATTGTTTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCACAAA 
CCCGAAAGAG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCGTAC 
TTGATGGGTT 
CCGGGTGCGC 
GGTATTGATG 
ACGGCAAAAT 
CAGAAGGAAT 
CTTGAATCAT 



AATTCGGATC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTTCG 
TCTTCGCTGT 
GGAAGTACAA 
TTCTGATTGC 
GCCCATGTTG 
CCTGCTGTTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACATCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GTATCCGTGA 
CCGGTTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGTTTGA 
TTCCGAGGTG 
TTTTGGTCTA 
TTTTATGTGC 
CCGTTTTGCC 
TTCCAAAACA 
GACTGA 



GTTTTGGGCG 
CGGCATGGTT 
TGCCTGATTC 
GG AAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CTTTGATTGT 
AAACTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGTGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGGGTGA 
CACCAGTTTC 
TCAGTTTACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGCGGCAGGG 
AGGAACAGGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAT 
CCGAATATCC 
CGTTCGTCGG 
TCTCGGCTCG 
GCGAAAAACG 
ATGTCTTCGG 
CGTCGAGAGT 



CAGATTTTTG 
TGTCGTTATC 
GCAATGTGCC 
AAAGAAAAAT 
AATTGCGCCC 
GAAAAACCAT 
GGCACAATGA 
CATTTGCCTG 
TGCTGACCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTA 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACG 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
GTGCTGTTGG 
GGCGTGGGTA 
CCCGCAGCGA 
CTGCAACGGC 



GTTTTCTGGG 
ATGATGTTTT 
GCCGTTCTGG 
CTCTGGCGGC 
GAGGTTGCCA 
TAACCGTGAA 
ACAAATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
ATTGGTTCAG 
ATTTTTACAA 
ACGGATAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAGCCTGTCG 
TGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
ATTACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GGCGCACCTG 
GCTGAACATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTGCAAC 
GACCCGTTCC 
TATTGGGTAC 
TTGTTTTCAG 
ACGGGATTTG 
TCGGCAAGGA 



This encodes a protein having amino acid sequence <SEQ ID 332>: 
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l MSKSRRSPPL 

51 YLVKFGSFWA 

101 REMKS FREKV 

151 DGSVLIAAKK 

201 PDNQAVYAKD 

251 DLPFEVKLKK 

301 LHGITIYQAS 

351 KYRLEFDQFT 

401 IVYRIRDAAG 

451 KQLKADTFMA 

501 FAQKGYLGLD 

551 PEWQQDEARN 

601 PGALLVYLGS 



LSRPW FAFFS 
QIFGFLGLYD 
KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKE 
RFLLHSMDAY 
VLLVLGTVLM 



SMRFAVALLS LLGIASVIGT 



VYASAWFWI MMFLWSTSL 



SSLLDVKIAP 
AHVALIVICL 



EVAKRYLEVQ 
GGLIDSNLLL 



651 QKEFPKHVES LQRLGKDLNH 



NLSFRGNVNI 
PRDFASDIEV 
KAWNLGDASR 
AEREKSLKST 
PVLQEQDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 
D* 



SEGQSADWF 
TDKATGEKLE 
EPWLKATSI 
LNDVRAVTQE 
ITGTRSGLQQ 
GAPAEIREQF 
EMLYGVMNAA 
LLOLDGFSEV 
LFSDGKIRFA 



VLQQNQPQTD 
CLI RNVPPFW 
GFQGKTINRE 
KLGMLTGRIV 
LNADNGILVQ 
RTIRVNKPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRI PLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
MSSARSERDL 



ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 
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orf 88a. pep 
orf88-l 
orf 88a. pep 
brf88-l 
orf 88a. pep 



MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 
| | | | | | M I I t I I I I I I i I I I I I I I i I M I I M I I I I I I I I I M I I I I I I 1 I I M I I 1 I I 
MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 



QI^GFLGLYDVYASAWFWI^FLWSTSLCLIR^PPFWREMKSFREKVKEKSLAAMRH 120 

liMIlt MMIlMt lllllllllllllllimillll III I I IMM II IIMIHi 

QIFGFl^LYDV^ASAWFWIMMFLWSTSLCLIRNVPPFWREMKS FREKV KEKSIAAMRH 120 

SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

in ii 1 1 ii i u i! m Mi i ii hum i inn inn it ii i M 1 1 1 in ii in ii i 

SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 1 80 

orf 88a. pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESIIX^NLSFRGNVNISEGQSADVVF 240 



orf88-l 
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orf88-l 

orf88a.pep 

orf88-l 

or f 88a. pep 

orf88-l 

orf88a.pep 

orf88-l 

orf88a.pep 

orfB8-l 

orf88a.pep 

orf88-l 

orf88a.pep 

orf86-l 

orf88a.pep 

orf88-l 

orf 88a. pep 

orf88-l 
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IMMMIMMMIMMMMMMMIIMMIIIMMMUMMMIMIMM 

GGLIDSNLLUCLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 

LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 
IIMMIMIMIIIMIIIMMIIMIIMMIMMMIMMMIMMMMMI 
LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

UiGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLElGKHKYRLEFDQFT 

I 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 t 1 I ■ I I I 1 1 1 1 1 1 1 1 1 I 1 1 1 I I I I I I I Ml 1 1 1 Mil, 
LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 

SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 
M I I I 1 1 1 1 II I M 1 t I I I I t 1 I 1 1 M I I I I I I I I I I M t I t I f I t K I I 1 I 1 I M I I 1 I 1 
SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 

PVLQEQDYFWITGTRSGLQQQYRWLRI PLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

M I I I I I I I I I I I I I I I I I 1 1 I M I I 1 I I M I I I I I I II I I I I I M I II I I I I I I I M M 
PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 



240 
300 
300 
360 



360 



420 



420 



480 



GAPAE I REQFMLAAENTLN I FAQKGY LGLDE FITSNI PKEQQDKMQGY FYEMLYG VMNAA 
I | | | | | | | | | | I | I I I t I I I I I M I I 1 M I M 1 I I I I I II I II I I I I I II I I I I I M I I I 
GAPAE I REQFMLAAENTLN I FAQKG YLGLDE FITSN I PKEQQDKMQGY FYEMLYG VMNAA 

LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 
| | | | | | | i I I I I I I I I I I 1 I I I I I I I I I 1 I I I I I I I I I I 1 I M I I I I I I I I I M I I I I I I 
LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 

PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 

|| | | | | 1 1 M I I 1 II II I I I I I I II I I I I I I M I I M 1 1 I I M I I I I I I I M I I I I I II I 
PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 



540 



540 



600 
600 



660 



660 



LQRLGKDLNHD 

I I I I I I I I I I I 
LQRLGKDLNHD 



672 
672 
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Homology with a predicted ORF from N gonorrhoeae 

ORF88 shows 93.8% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from N. 
gonorrhoeae: 

orf88.pep 

orf88ng 

orf 88. pep 

orf88ng 

orf 88. pep 

orf88ng 

orf 88 .pep 

orf 88ng 

orf 88. pep 

orf88ng 

orf 88. pep 

orf88ng 

orf 88 .pep 

orf88ng 



MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 

|| | || Ml |:ll III II II MM HUM I Ml II Mill lllll N II Mil II IN M 
MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 

PLTLHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 

1 M 1 I I 1 1 M t M M I I I I M 11 M II I I 1 I 1 1 I I t 1 1 I t I I I 1 I I I I 1 MIMMIM 
PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFD 

QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 

immmimiiMmiii immimiit mm iimm 

QFTSMNVEEWSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 

YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 
I || 1: 1 1:: I III: I MM I I Ml I I I t Ml Ml I I I III Ml Ml I 111 Ml I I Ml 
YMLPILQDKDYFWLTGTRSGLQQQYRWLRI PLDKQLKADT FMALREFLKDGEGRKRLVAD 

ATKGAPAE I REQFMLAAENTLN I FAQKG YLGLDE FITSNI PKEQQDKMQGY FYEMLYG VM 
Ml II I M I I M II I M I I 11 M M M M M M I M I M M M M II M M II I M II 

ATKDAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVM 

RAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 

I I I | I M I Ml M M I I M M I Ml II I II Ml M Ml M M I II Mi I Ml M M M 
NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 

TRSXGPLLVYL 
III I Ml I I 

TRSPGAUiVYLGSVlAVUSTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 



60 



60 



120 



120 



180 



180 



240 



240 



300 



300 



360 



360 



371 



420 
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An ORF88ng nucleotide sequence <SEQ H) 333> was predicted to encode a protein having amino 
acid sequence <SEQ ID 334>: 

1 MVTXNADNGM LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

2C1 LQQOYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVM 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

35i SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 

4 01 RFAMSSARSE RDLQKEFPKH VESLQRLGKD LNHD* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 

1 ATGAGTAAAT CCCGTATATC TCCCACACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGCACG GTGTTACAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGACC GTTTTGGACT CGGATTTTTG ATTTTTTGGG 

201 TTTGTATGAT GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTC 

251 TGGTGGTTTC TACCAGTTTG TGTTTAATCC GTAACGTTCC GCCGTTTTGG 

301 CGCGAAATGA AGTCTTTCCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCCCCC GAAGTTGCCA 

401 AACGTTATCT GGAGGTGCGG GGTTTTCAGG GAAAAACCGT CAGCCGTGAG 

4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCAcaatga acaaATGGGG 

501 CTATATCTTT GCccaagtag ctTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGCTG AAGCTGGGTA TGCTGGCCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AAAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT GTTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGCGA TATTGAAGTA ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGAGGGA TGCTTCGCGC GAACCTGTCG 

1001 TGTTGAAGGC AACCTCCATA CACCAGTTTC CGTTGGAAAT CGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGT GCGGAACGGG AAAAAAGCCT GAAATCCACT CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATCGTGTACC GCATCCGTGA TGcggCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGATTTTGC AGGACAAAGA TTATTTTTGG CTGACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GACGCACCTG 

14 51 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAATATC 

1501 TTTGCGCAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGGG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAC CGTTTCCTGC TGCACAGTAT 

1701 GGATGCCTAT ACGGGGCTGA CGGAATATCC CGCGCCTATG CTGCTCCAGC 

1751 TTGACGGGTT TTCCGAGGTG CGTTCCTCAG GTTTGCAGAT GACCCGTTCG 

1801 CCGGGTGCGC TTTTGGTCTA TCtcggctcg gtattgttgg TTTTGGgtac 

1851 ggtaTttatg tTTTATGTGC GCGAAAAACG GGCGTGGgta tTGTTTTCag 

1901 aCGGCAAAAT CCGTTTTGCT ATGtCTTcgg CCcgcagcga ACGGGATTTG 

1951 cAGAaggaaT TTCCAAAACA CGtcgAGAGC CTGCAACggc tcggcaaggA 

2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 

1 MSKSRISPTL LSRPW FAFFS SMRFA VALLS LLGIA5VIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEK5LAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADVVF LNADNGMLVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 
301 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPWLKATSI HQFPLEIGKH 
351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 
401 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 
4 5i KQLKADTFMA LREFLKDGEG RKRLVADATK DAPAEIREQF MLAAENTLNI 
501 FAQKGYLGLD EFITSNIPKG QQDKKQGYFY EMLYGVMNAA LDETIRRYGL 
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551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
|oi PC ALLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D« 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 

orf88-l pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASV1GTVLQQNQPQTDYLVKFGSFWA 
otf 88 l.pep | I I | I i i t I i 1 I I | | 1 I M I 1 I M I 1 1 1 1 1 1 1 I I M M I I I I I I I I I II: 

otf88ng-l MSKSRISPTLLSRPWFAF^ 

otf 88-1 . pep QIFGFLGLYDVYASAWmi^FLWSTSLCLIRWPPFtfREMKSF^KVKEKSLAAMRH 

otfBS l.pep u 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 II I Ml MMI I IIMII 

otf 88ng- 1 RI FDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 

orf 88-1 . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINR^DGSVLIAAKKGTMNKWGYIFAHVALIVICL 
oiibb i.pep . ... . | . | | I | 1 1 1 1 I I | | . | 1 1 | | | | | | | | | 1 1 1 1 1 1 1 1 I 1 1 1 I 1 1 I I H 1 1 I I 1 1 I 
otf88ng-l SSLLDVKIAPEVAKRYLEVRGFQGKTVSREIXSSVLIAAKKGTMN 

orf 88-1 . pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGl^ISEGQSADVVF 
i-PeP | ,,,,, : ,,,,, | , | | | | | | | | | | | I | I I | | I I I I I I I 1 1 I I I I I III I II 

orf88ng-l gglidsnlllklgmlagrivpdnqavyakdfkpesilgasnlsfrgnvnisegqsadvvf 
otf 88-1 .pep lnadngilvqdlpfevklkkfhidfyntgmprdfasdievtdkatgeklertirvnh^ 

ormo i.p p i . . i I i . I I I I I i i i 1 1 i i i i i i I I I I I I I I I I I 1 1 1 II 1 1 1 I I I I I 1 1 I I I I ' I I I I J I 

otf88ng-i ^adngmlvqdlpfevklkkfhidfwtgmprdfasdiewdkatgekle 
otf88-i pep lhgitiyqasfadggsdltfkawnlgdasrepvvlkatsihqfpleigkhkyrlefdqft 

orf88 l.pep utai w HIMIIIII IIIIIIMIIIIinillllllllinillMII 

Otf88ng-l LHGIT I YQAS FADGGSDLTFKAWNLRDASREPWLKATS IHQFPLE IGKHKYRLEFDQFT 

otf88-l .pep SMNVEDMSEGAEREKSLKSTLNDVPAVTQEGKKYTNIGPSIWM 

P P ... . | . . | | | | | | | | 1 1 | | | | | | | | | | | | | | | | | | | | | | 1 1 I I I I I I I I I I I I I Ml Ml 
otf88ng-l ShBIVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDA^ 

otf 88-1 .pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALMFLKDGEGRKRL 

i-P P « n |: I l l I I I || I I I || || I I I I I I I I I I I II I I I I I I I M 1 1 1 1 1 1 1 1 !' J" 
Otf88ng-l PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQIJCADTFMAIJ^FLKDGEGRKRLVADATK 

otf 88-1 .pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEMDKMQGYF^ 
° xpp ,,,,, , I I I , | I 1 1 1 1 | | | | 1 1 | | | | | | I I I I 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 I I I I I I I 1 1 I 
otf88ng-l DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKM^^ 

nrfaa-l ceo LDETIRRYGLPBWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 

otf88-l.pep i i i i i i i i i J i i i II I I I I I I I I I I I I I I I I I I 1 1 1 I I I I I 1 1 1 I I I I I I I I I I 

otf88ng-l LDETIRRYGLPEWMDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 

orf 88- 1 oeo PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 
OtfBB l.pep reftW. , , , , , , , , , , , , , | , , , | , , , , , | , , | , | | | | | | | | | | | 

Otf88ng-l PGALLVYLGSVLLVLGTVFM 



60 
60 
120 
120 
180 
180 
240 
240 
300 
300 
360 
360 
420 
420 
480 
480 
540 
540 
600 
600 
660 
660 



50 



orf 88-1. pep LQRLGKDLNHD 
I I I I I I I I I I I 
otf88ng-l LQRLGKDLNHD 



671 
671 



55 



60 



65 



Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi 1 2984296 (AE000771) hypothetical protein (Aquifex aeolicus] Length = 537 
Cities'- ?ir 3 3; 2 ( 2 U)?ToritIv^ 8 l 5 9/334 ,47*,, Caps = 59/334 (17*, 
Query 16 FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 74 

+ F +S++ A* ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

Sbjct: 80 YDFLASLKLAIFIMLVLGILSMLGSTYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

Quety: 75 AWFWIMMFLWSTSLCLIRNVPPFWREMKSFIIEKVKEKSLAAMRHSSLLDVKI^ 134 
U * ++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 

Sbjct- 140 WYYILFIVLLAVNLIFCSIKRLPRVWKQAFS-KERILKLDEHAEKHLKPITVKI-PDKDK 197 



Ouerv 135 — RyixVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICLGGLIDSNLLLKL 192 
w ++L +gf+ V E + + A+KG +♦ G +AL+VI G LID 
Sbjct- 198 VLKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 



249 
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Query 193 GMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGMLVQDL 252 

+I+G RG++ ++EG + D v+ + A+ L 

Sbjct: 250 AIVGV RGSLIVAEGDTNDVMLVGAE — QKPYKL 280 

Query 253 PFEVKLKKFHIDFY NTGMPRDFA SDIEVTDKATGEKLER TIRVNHPLT 300 

PFVLFIY N+ + FA SDIE+ + G K+E T++VN P 
Sbjct: 281 PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKVNEPFD 33*7 

10 Query: 301 LHG I T I YQAS FA — DGGS DLTFKAWNLRDASRE P 332 

++QA++ DG S + + + A +P 

Sbjct: 338 FGRYRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could 
1 5 be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 

337>: 



1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

20 51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AG AT GAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

25 301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 



30 1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDKQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 



35 1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

40 251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

45 This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 



1 MMSNKMEQKG FT L I EMM I W AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

50 Computer analysis of this amino acid sequence gave the following results: 
Homology with PilE of N. gonorrhoeae (accession number Z69260V 
ORF89 and PilE protein show 30% aa identity in 120a overlap: 



BNSDOCJD <WO 992457BA2J_> 



WO 99/24578 



PCT71B98/01665 



-226- 



orf89 


8 


PilE 


5 


orf89 


67 


PilE 


65 



QKGFTLIXXMTWAILGIISVIAIPSVXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

OKGFTLI MIV+A1+GI++ +A+P+Y Y + S+ G + + + 

QKGFTLIELMlVIAIVGILAAVALPAYQDYTARAQVSEAILliAEGQKSAVTEYYLNHGIW 64 

- DDNQTIENKLE I FVSGYKMNPKIAKKYSVSVKFVDKEKSRAYIUiVGVPKAGTG YTLSVW 
+ KI KY SV + GV K G LS+W 



125 



DN + 



+G 



10 



Homology with a predicted ORF fro m M meningitidis (strain A^ 

ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A of M 



15 



20 



25 



meningitidis: 

orf 89. pep 
orf 89a 

orf89.pep 
orf 85a 

orf 8 9. pep 
orf89a 



10 20 30 40 50 60 

MMSNXMXQKGFTLIXXMIWAILGHSVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

I | i | | | t | | | | | | | II III I I I I I I I I I II I M 1 M I I I I I I 1 I 
MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

10 20 30 40 50 60 

70 80 90 100 HO 120 

ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

I I I i I M I I II I :: M I II U I I I 1 1 I I I I : I 1 : 1 1 1 : 1 I :: 1 I IN III I I I: M I I 
ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

70 80 90 100 110 120 

130 140 150 160 

TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 a ( 1 1 1 1 1 

TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 
130 140 150 160 



The complete length ORF89a nucleotide sequence <SEQ ID 341> is: 



30 



35 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGATGAGTA 
NATNGNCNTC 
ATCNNAGTTA 
GTCGGTATCA 
CGATAATCAG 
AGATGAATCC 
AATGAGGAAA 
GACGGGTTAT 
AATGCCGTGA 
GATGTCGGCT 



ATAAAATGGA 
GCGATACNCN 
TATTGAAAAA 
ACAATATTTC 
ACCATCAAGA 
GAAAATTGCC 
AACCNAGGGC 
ACTTTGTCGG 
TGCCGCTTCT 
GTGAAGCCTT 



ACAAAAAGGG 
GCNTTANCAG 
GGCTATCAGT 
CAAACAGTNT 
GCAAACTGGA 
GAAAAATATA 
ATACAGCTTG 
TATGGATGAA 
GCCCGAGCCC 
CTCTAATCGT 



TTTACATTGA 
CGTCATTNCN 
CCCAGCTTTA 
ATTTTGAAAA 
AATATTTGTC 
ATGTTTCGGT 
GTCGGCGTTC 
CAGCGTGGGC 
ATTTGGAGAC 
AAAAAATAG 



TTGNGANGNT 
ATNNNTNCNT 
TACGGAGATG 
ATCCCCTGGA 
TCAGGCTATA 
GCATTTTGTC 
CAAAGACGGG 
GACGGATACA 
CTTGTCCTCA 



40 This encodes a protein having amino acid sequence <SEQ ID 342>: 

1 MM SNKMEQKG FT LI XXX XXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM. 

51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 

101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 

151 DVGCEAFSNR KK* 

45 ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 



50 



55 



60 



orf 89a. pep 
orf89-l 



orf 8 9a. pep 
orf 8 9-1 



orf 89a. pep 
orf89-l 



10 20 30 40 50 60 

MMSNKMEQKG FTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINN I SKQX 

K I | 1 1 | | | | K | t I I II 111 I I I I I I I I I I I I I I I I I I I I I I I I I 

MMSNKMEQKGETLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

10 20 30 40 50 60 

70 80 90 100 110 120 

ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

I t I I I 1 I II I I I I 1 1 I I 1 I 1 1 I I I I I II : I I : t I I : I I I I Ml 111111:1111 
ILKNPLDDNQTIENKIXIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

70 80 90 100 110 120 

130 140 150 160 

TLS VWMN SVG DGYKCRDAAS ARAHLET LS S DVGCEAFSNRKKX 
1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 I I I 1 1 1 1 1 M I I I I I 1 1 
TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
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130 140 150 160 

Homology with a predicted QRF fro m N.eonorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 



5 gonorrhoeae: 

0 -f89 MMSNXMXQKGFTLIXXM1WAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 

{111 t I I I I M 1 II I 1:1 Ml II I II II LI I I Mil IN HIM I III!*- HI _ 
orf89ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

10 or'89 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 120 

| | I | I | | | : | ::: }| : I H I 1 I I I I I I I I I I I I I I 1 : I II I I 1 1 I I I I M I : I M II 
orf89r.g ILKNPQDDNDTLKSKUCIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

orf89 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 162 

15 | | 1 M I 1 1 I 1 1 I 1 1 II 1 1 : 1 1 1 I : Ml Ml 1111111111 

orf 89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 162 

The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 

20 101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

301 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

25 351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 

4 51 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ED 344>: 

1 MMSNKMEQKG FTLIEMMIW TILGIISVI A IPSYQSYIEK GYQSQLYTEM 
30 51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
35 identity in 162 aa overlap: 

10 20 30 40 50 60 

orf 89-1 pep MMSNKMEQKGFTLIEMMIWAI LGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
| | || | || | | I | I I I I I II I I M I I I I M M II I II I M II II I I I I I M M I M I : Ml 
orf8Sng mmsnkmeqkgftliemmiwtilgiisviaipsyqsyiekgyqsqlytemvginnvlkqf 
40 ~ l0 2Q 3Q 40 50 60 

70 80 90 100 110 120 

orf 89-1. pep ilknplddnqtienkleifvsgykmnpkiakkysvsvkfvdkeksrayrlvgvpkagtgy 
Mill I I |:|:::lt:lllllM! Ill II 1111111:111 I I I I I I I I I I I : I I M I 
45 0-f89ng ilknpqddndtlksklkifvsgykmnpkiakkysvsvrfvdaekprayrlvgvpnagtgy 

70 80 90 100 110 120 

130 140 150 160 

or f 89-1 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
50 | | | | | M M II I I I I I I I M I I I :: I I I : I II I I I I M I I I 

orf89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 

13C 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from Kmeningitidis and N. gonorrhoeae, and their 
55 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 11A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test., wnfirming that 
5 ORF89-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 41 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

10 10 1 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

10 \l\ CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG . GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC . . . 

This corresponds to the amino acid sequence <SEQ ID 346; 0RF9 1 >: 

if ! MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 

1J 51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP... 

Further work revealed the complete nucleotide sequence <SEQ ID 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

20 101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

2U Hi CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCT6ATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

OC 351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

25 401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

30 This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

! mtcss t.TSAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 ROKAEAYAIP YFDFQRMTAL AV GNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

3 5 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

40 or£91.pep mkksslisalgigilsigmafaapadavsqirqnatqvlsilkngdantarq^ 

orf91a ^SSFISALGIGILSI^ 

!0 20 30 40 50 bO 

45 70 80 90 

orf 91 . pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 

orf91a YroroROTALAVGNPTOTASDAQKQALAKE 

-70 80 90 100 110 120 
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or f 91a KGGKEI I VRAEVGVPGQKPVNM D FTT YQSGGKYRT YNVAI EGAS L VTVYRNQ FGE I IKAK 

130 140 160 1™ 180 



The complete length ORF91a nucleotide sequence <SEQ ID 349> is: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGAAAAAAT 
CGGCATGGCA 
ACGCCACTCA 
CGCCAAAAAG 
GACCGCATTG 
AACAAGCGTT 
GGCACGATGC 
CATCGTCAAT 
TACCCGGGCA 
GGTAAATACC 
CGTGTACCGC 
GACTGATTGC 



CCTCCTTCAT 
TTTGCCGCCC 
AGTATTGAGC 
CCGAAGCCTA 
GCGGTCGGCA 
GGCCAAAGAA 
TGAAATTAAA 
AAAGGCGGCA 
AAAACCCGTC 
GTACCTACAA 
AACCAATTCG 
CGAGTTGAAG 



CAGCGCATTG 
CTGCCGACGC 
ATCTTAAAAA 
TGCGATTCCC 
ACCCTTGGCG 
TTTCAAACCC 
AAACGCCAAC 
AAGAAATCAT 
AACATGGACT 
CGTCGCCATC 
GCGAAATTAT 
GCTAAAAACG 



GGCATCGGTA 
GGTAAACCAA 
GCGGTGATGC 
TATTTCGATT 
CACCGCGTCC 
TGCTGATCCG 
GTCAACGTCA 
CGTCCGCGCC 
TCACCACCTA 
GAAGGCGCGA 
CAAAGCGAAA 
GCAGCAAGTA 



TTTTGAGCAT 
ATCCGTCAAA 
CAACACCGCC 
TCCAACGTAT 
GACGCGCAAA 
CACCTATTCC 
AAGACAATCC 
GAAGTCGGCG 
CCAAAGCGGC 
GCCTGGTTAC 
GGCGTGGACG 
A 



20 



25 



30 



35 



40 



45 



This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKSSFI SAL GIGILSIGMA FAA PADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEI I VRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf91a pep MKKSSFI SALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 
' " i I I | I : I I 1 I ! i I I I 1 I ! I! I M I I I 1 I : 1 i i I i I I M I I I t I : I I I i I I 1 I ! I I t I I I t 
orf91-l MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
10 20 30 40 50 60 

70 80 90 100 110 120 

or f 91a . Dep YFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLL I RTYSGTMLKLKNANVNVKDNP I VN 
| M I I I I I I I I ! I I I I I I I I M I I I II I I I I II I I 1 M I I I I I I i M I I 1 11 I t I I I I I I 
orf91-l YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf91a pep KGGKEI I VRAEVGVPGQKPVNMDFTTYQSGGKYRT YNVAI EGAS LVTVYRNQFGE I IKAK 
| | | | | t I I I I t I I I I 1 1 M I I I I I t I I 1 I t I f II 1 1 1 1 I 1 1 I I I M t I I I t 1 1 1 I I I I i 1 
orf91-l KGGKEI I VRAEVGVPGQKPVNMDFTT YQSGGKYRT YNVAIEGASLVTVYRNQFGEI IKAK 

130 140 150 160 170 180 

190 

orf 91a . pep GVDGLIAELKAKNGSKX 
I I I I I I I II I I I I I : I I 
orf 91-1 GVDGLIAELKAKNGGKX 
190 

Homology with a predicted ORF from N gonorrhoeae 

ORF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
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55 



gonorrhoeae: 

orf 91 .pep 
orf 91ng 
orf 91. pep 
orf 91ng 



MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 60 

: I I | I : i I M M I I I ! I I I I 1 I : i M I I : I I I I I I I 1 I I : I I I : I I I • I I 111111 = 1 

VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 93 

1 1 1 if 1 1 i 1 1 ii ii 1 1 i ii MUM iii 

YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 



The complete length ORF91ng nucleotide sequence <SEQ ID 351> is predicted to encode a protein 



having amino acid sequence <SEQ ED 352>: 
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1 V KKSSFISAL GlGILSIGMA FA SPAPAVGO IRQNATQVLT ILKSGDAASA 
51 RPKAEAYAVP YFDFQRMTAL AVGN PWRTAS DAQKQALAKE FQTLjLlRTYS 
101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 
151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

401 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 

451 GGCAAATACC GTACCTACAA CGTCGCCATC GAAGGCACGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATCAT CAAAGCCAAA GGCATCGACG 

551 GGCTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 354; ORF91ng-l>: 

1 MKKSSFI SAL GlGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGN PWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

ORF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 

10 20 30 40 50 60 

25 orf 91-1. pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 

^ PF iiiicnililllllllll ll> I II 1 1:1111111 111:11 Oil I ill IIMIIH 



orf91ng-l MKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 



30 



10 

70 80 90 100 110 120 

orf 91-1. pep YFDFQRMTALAVGNPWRTASDAQKQAIJVKEFQTLLIRTYSGTMLKLKNANyNVKDNPIVN 

orfSlng-l yfdfqr^alavgnpwrtasdaqk^^ 

- - -jq 30 90 100 110 J-*u 



35 130 140 150 160 170 180 

orf91-l Pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASL\nVYRNQFGEIIKAK 
orf 91 -pep | j | | | | ; 1 1 1 1 1 1 : | | 1 1 1 1 1 1 1 1 I 1 1 I 1 1 1 1 1 1 I 1 1 1 1 1 I I : I I M I Ml 1 1 1 1 1 1 1 1 1 



40 



orf91ng-l KGGKEivVRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYWQFGEIIKAK 
' 130 140 150 160 l'O lav 



190 

orf 91-1. pep GVDGLIAELKAKNGGKX 
1:111111111111111 
45 orf91ng-l GIDGLIAELKAKNGGKX 

190 



In addition, ORF91ng-l shows homology to a hypothetical E.coli protein: 

SDIP45390IYRBC ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
REGION PRECURSOR (F211) >gi|606130 (U18997) 0RF_f211 [Escherichia colli 
50 >gi 1 1789583 (AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 

region [Escherichia coli] Length •= 211 

Score = 70-6 bits (170), Expect = 6e-12 

Identities - 42/137 (30%), Positives ■= 76/137 (54%), Gaps - 6/137 (4%) 

55 Ouerv 59 V P Y FD FQRMT ALAVGN PWRTAS DAQKQALAKE FQT L L I RT Y S GTMLKFKNATVN VKDN P I 118 

+PY + AL +G +++A+ AQ++A F+ L + Y + + T * P ^ 
Sbjct: 65 LPYVQVK^AGALVLGQYYKSATPAQREAYFAAFREYLKQAYGQALAMYHGQTYQIA PE 122 

60 Query: 119 VNKGGKEIV-VRAEVGIP-GQKPVNMDFTTYQSG GKYRTYNVAI E GT S ^YZ^^^J^^? 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPIX3DKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGIWQAYDMIAEGVSMITTKQNEWG 182 
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Query 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
SbjCt: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
5 protein, it is predicted that the proteins from ^meningitidis and ^gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 42 

The following DNA sequence was identified in meningitidis <SEQ ID 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

in 51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

1< 3d GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

A 351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

90 1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 

51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ED 357>: 

<><: i ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

" 51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCT^GAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

-Jf) 251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

35 This corresponds to the amino acid sequence <SEQ ID 358; ORF97-l>: 

i MKHILPL IAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
$1 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

40 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.m enin£itidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

AS nrf97 neo MKKILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 

4!) orf97.pep MKKi^ ,,,,,,,,,, 1 1 1 1 1 | : | || 1 1 1 1 111! Illll : :IIMM 

orf97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 
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orf97a 

orf97.pep 
orf97a 
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MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLM^DPAFALQLPLRVLVTETDGK 

IIIIIMIMMtlMlM! tltlllllilllllHIIItllllill I I I I Ml 

MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

80 90 100 HQ 120 



70 



130 140 150 160 

VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 

I I I I I I I I » I I I 11 I I I I M I I I I I I I I I I I I I 1 1 I : I 1 I 
VRAAYT DTRALI AG SR I G FDEVANT LAN AEKLI QKT I GEX 
130 140 150 160 



The complete length ORF97a nucleotide sequence <SEQ ID 359> is: 



15 



20 



ATGANACACA 
CGCTTCGGNN 
101 TGACCACGCA 
151 CGCCTTGAAA 
CGACCATCAG 
AAGTCATCGT 
GACCCCGCCT 
GGACGGCAAA 
401 GCAGCCGCAT 
451 AAACTGATAC 



1 
51 



201 
251 
301 
351 



TACTCCCCCT 
CATCCTGCCA 
TACCCTCACC 
CCGCCATAAA 
GAAGCCGCCC 
CTTCGGCACG 
TCGCCCTGCA 
GTACGCGCCG 
CGGTTTCGAC 
AAAAAACCAT 



GANTGNCGCA 
GCGAACCGCA 
TCAAAATACA 
AAGCAAAGGG 
GCCGAAACGG 
CCCAAAGCCG 
ACTGCCCCTG 
CCTATACCGA 
GAAGTGGCAA 
AGGCGAATAA 



TCCGCACTCT 
AACCCAAAAC 
GTTTTGACGA 
ATGGACATTT 
CTTAACGATG 
GTACGCCGCT 
CGCGTCNTCG 
TACGCGCGCC 
ACACTTTGGC 



GCATTTCAAC 
GAAACCGCTA 
AACCGTCAGC 
TTGCCGTCAT 
CAGCCGGCAA 
GATGGTCAAA 
TTACCGAAAC 
CTCATCGCCG 
AAACGCCGAA 
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This encodes a protein having amino acid sequence <SEQ ED 360>: 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYT DTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 

10 20 30 4 0 50 60 

orf 97a . pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

I 1 I I I I I I I I I I I I I I I 1 I I I I : II I I I I I M ! I I I I I I t I I t I M I I I I 

orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 97a . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 
I I I I I I ! I I I i I II I I I 1 I M I I I II I I I I ! I 1 I M I I I M I I M I ! I I II I I I i I I U 
orf 97-1 MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 

130 140 150 160 

orf 97a. pep VRAAYT DTRALIAG SRI G FDEVANT LANAEKLI QKT I GEX 
I I 1 1 I I I I I I I I II I It I I I I I I I I I I I I I I I I I I I : I M 
orf 97 - 1 VRAAYT DTRALI AG S R I G FDEVANT LAN AEKL I QKT VGEX 

130 140 150 160 

Homology with a predicted ORF from N. gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from N. 
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gonorrhoeae: 

orf 97. pep 
orf97ng 
orf 97 .pep 
orf 97ng 
orf 97 .pep 
orf97ng 



MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60 
I I I I I I III I I : I I 1 1 1 1 I I I I :: I II I I I I I MM Mill : : I I I I I I 

MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 

MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 
M I I M I I I II II M I 1 1 I II I I II M I M I I M M II II M M I I I I I I I 1 I I I I I I I I 
MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

VRAAYTDTRALI AGSRIGFDEVANTLANAEKLIQKTVGE 159 

1 1 : 1 M 1 1 1 1 1 1 : 1 1 1 1 : 1 I 1 1 1 I 1 1 1 1 I 1 1 1 1 1 I I I I I 
VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGE 159 
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The complete length ORF97ng nucleotide sequence <SEQ ED 361> is predicted to encode a protein 
having amino acid sequence <SEQ ID 362>: 

1 mvhtt. PPIAA SAFCISTASA HPAGKPPTQ* ETAMTTHTLT SKYSFDETVS 
51 rlEtAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
5 101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 

1 ATGAAACACA TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

10 101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

lV 151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

30^ GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

15 35 i GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

401 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 

1 MKHILPL IAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
on 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

^ 10 l DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

75 orf97-l pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

a 0rry A ' P F . ........... . i » i i i i ii u • • i I II ! I M 1 I ! I I I I I I I I I I I II I I I I I I I M I 



30 



35 



• - HiMIIMIIlMlllimil::) M I I I I M I II I I II I M I I M M M M M I M 

orf97na-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
y 10 20 30 40 50 60 

70 80 90 100 110 120 

or-97-1 Pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

orf97na-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
y 70 80 90 100 110 120 



130 140 150 160 

or f 97-1 pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
P P ||: | It HUll: Ml 1:1 I I II ill Ml II I II III Ml 
orf97na-i VRTAYT DTRALI VGS RI S FDEVANT LANAEKLIQKT VGEX 
40 ^ 130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
45 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of affinity purification of the GST-fusion and His-fiision 
proteins. Purified GST-fusion protein was used to immunise mice, whose sera were used for 
Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 
experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 
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Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 



Example 43 

The following DNA, believed to be complete, sequence was identified in N .meningitidis <SEQ ID 
365>: 



5 1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

10 251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 

351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

4 51 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

15 501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

20 101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

25 101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

30 351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

35 This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT NRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

40 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A ofN. 



meningitidis: 

10 20 30 40 50 59 

45 orfl06.pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 

I I I I I I I I I I I II:: II : : :: I I I I I I I I I I I i I I : I I I I I I I I I I I I I I I I 
orfl06a MAFITRLFKS IKQWLVLLPMLSVLPDAAAEG I DVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 

50 60 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKUK?LIGDDDNIDYKLSFHPLTKRYRVTVGA 
lit III I I I I I 1 t 1 I I I 1 I I 1 I M I I I III llltlll |:Mlillll 
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orf 106a LQXAXXRGVXLNXTLXWQLSAPIIASYRFXLGQLIGDDDXIDYKLSFHPLTNRYRVTVGA 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

orfl06 pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
Ml I 1 I | i | | | I | 1 | 1 I I I f I t I I I I I I I 1 I I I I 1 1 I I I I V I I ^1 K I I | | f | | I I 1 t I 1 
or f 10 6a FSTXYDTLDAAIJlATGAViU'lWKVLNKGALSGAEAGETKAElRLTLSTSKLPKPFQINALT 
130 140 150 160 170 180 

180 190 199 

or f 10 6 . pep SQNWHLDSGWKPLN I IGNKX 
I I I I I I I I I I I I I I I I I II I 
orf!06a SQNWHLDSGWKPLN I IGNKX 

190 200 



15 Due to the K-»N substitution at residue 1 1 1, the homology between ORF106a and ORF106-1 is 
87.9% over the same 199 aa overlap. 

The complete length ORF106a nucleotide sequence <SEQ ID 369> is: 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGGCTTTTA 
GCTGCCGATG 
TGAGCCGCGC 
AGCCGCTTCC 
GGGCGTGNCG 
TCGCTTCTTA 
ATTGACTACA 
CGTCGGCGCG 
CGACCGGCGC 
GGTGCGGAAG 
TTCAAAACTG 
GGCATTTGGA 



TTACGCGCTT 
CTTTCCGTTT 
CGAAGCGAGG 
AAACCGAGCT 
CTCAACTNTA 
TCGGTTTNAA 
AACTGAGTTT 
TTTTCGACAG 
GGTTGCCAAC 
CAGGGGAAAC 
CCCAAGCCTT 
TTCGGGTTGG 



ATTCAAAAGC 
TGCCGGACGC 
ATAANCGACG 
GCCCGACCAG 
CCTTAAGNTG 
TTGGGGCAAC 
CCATCCGCTG 
ANTACGACAC 
TGGAAAGTCC 
CAAGGCGGAA 
TTCAAATCAA 
AAACCTCTAA 



ATTAAACAAT 
GGCGGCGGAG 
GCGGGCAGCT 
CTCCAANNNG 
GCAGCTTTCC 
TGATTGGCGA 
ACCAACCGCT 
CTTGGATGCG 
TGAACAAAGG 
ATCCGCCTGA 
TGCATTGACT 
ACATCATCGG 



GGCTTGTGCT 
GGGATAGATG 
TTCCATNAGN 
CGNNGNGCCG 
GCCCCGATAA 
TGACGACNAT 
ACCGCGTTAC 
GCATTGCGCG 
CGCGCTGTCC 
CGCTGTCCAC 
TCTCAAAACT 
GAACAAATAA 



30 This encodes a protein having amino acid sequence <SEQ ID 370>: 

1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEA R IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 



35 



Homology with a predicted ORF from N. gonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from N. 
gonorrhoeae: 



40 



45 



50 



orf 106. pep 
orf 106ng 
orf 106. pep 
orfl06ng 
orfl06.pep 
orf 106ng 
orf 106. pep 
orf 106ng 



MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 
| I t I I I i i I I I 11:: : | :: : : I t I I I : : I I I I I I I 1 I I : I I t I 1 I t I I I I I ! I 
MAFITRLFKSIKQWLVLLPILSVLPDAAAEGIAATRAEARITDGGRLSISSRFQTELPDQ 



59 



60 



119 



LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
I I I I I I I I I I I I 11 I I! I I I I I lllimilllMll!limmiM|:ll!Ml!l 
LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 179 

MlllllllllllllllillllMlllllllMIIMIIIIIIIIIIIilllllllllll 
FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 180 

SQNWHLDSGWKPLN I IGNK 198 
I i I It I II II II I I I II I I 
SQNWHLDSGWKPLNIIGNK 199 



Due to the K->N substitution at residue 1 1 1, the homology between ORFl06ng and ORF106-1 is 
55 91 .0% over the same 1 99 aa overlap. 
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The complete length ORF106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

5 151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

10 401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

lU GGTGCGG AAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 

15 i MAFITRLFK S IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

1J 51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNHGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
20 protein, it was predicted that the proteins from ^meningitidis and ^gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF106-1 (18kDa) was cloned in pET and pGex vectors and expressed in ExolU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13A shows the results of affinity purification of the His-fusion protein, and Figure 13B shows the 
25 results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13Q These experiments confirm that 
ORF106-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 44 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
30 373>: 

1 ATGGACACAA AAGAAATCCT CGG.TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCc TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

35 201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

40 451 CTCGCCATCC TGCTGCTG . T GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTQCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

AC 701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

50 951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG.TGCCGC 
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1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC.GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

5 1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCT7CGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

1401 GAAAAAACAA GGTTTCCCAT TATGA 

1 0 This corresponds to the amino acid sequence <SEQ ID 374; ORF1 0>: 

1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

15 201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAPPARLSAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA E1SGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

401 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

20 451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GT CAT CATC C TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

25 151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

30 401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC G.CTACGGCAT 

35 651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

40 901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCG7GT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

45 H51 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

50 1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ED 376; ORF10-1>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

55 151 LAI LLLLPLT VGLL HFPANT AVLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLE QLGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AES AAALLAS 

301 ALCLTGIFSP LA SLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRP IALAT LGALAANLLL LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

60 401 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

Computer analysis of this amino acid sequence gave the following results: 
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Prediction 

ORF10-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 
™th F psM from Streptococ c i fftgmopfa/at f accession number U40830). 
5 ORF10 shows homology with the epsM gene of S. thermophilus, which encodes a protein of a size 
similar to ORF10 and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 

Identities = (25%) 

10 Query 213 LRYG I PLALSSLAYWGIASADRLFLKKYAGLEQLGVySMGI S FGGAALLLQS I fSTVW 270 

L Y +PL SS+ +W L ++ R F+ + G G+ ++ + + IF+ w 

Sbjct: 210 LYYALPLIPSSILWWLUJASSRYFVLFFLGAGANGLlAVATKIPSIISIFNTirTQAW 267 



15 Identities - 15/57 (26%), Positives = 31/57 (54%) 

Query: - l^G^gT+l"' + G LQTAL + ++ + + 



20 



25 



7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 
. _..,.«. i t j ■ lot * r. LOTAL + ++ + +A+R 



L + G++GS +L +++PL ++ + G L ui " » ™ 

Sbjct: 12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 

Identities » 16/56 (161). Positives = 36/96 (37%) 
Ouerv 307 IFSPLASIALPENYAAVRFTWSCMLPPLFYTLTEISGIGLNVVRKTRPIXXXXXXXXXX 366 

+ p+ ++ +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPI VEKWSSDYAS SWQYVPFFMLSMLFSSFS DFFGTN Y I AAKQTKGVFMTS I YGT IV 364 

Hnmftlnpv with a predicted ORF fr om N. meningitidis (strain A) 

ORF10 shows 95.4% identity over a 475aa overlap with an ORF (ORFlOa) from strain A of N. 
meningitidis: 

,a 10 20 30 40 50 60 

or£10 pep MDTKEIiaYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVI^OTAAGLTVSVLCLGLDQA 
o pep , , | ,,, | | | | | | I I 1 1 I I I I I I I I I I I I I I I I t I I I I M I I I I I I I I I I I I I I I I I I I I 



I I I I I I I I I | | | | | | | | I I I I I I I I I I I i I I t I I I I M I M I I I I I I " ' ■ 

orflOs MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
10 20 30 40 50 60 

70 80 90 100 110 120 



35 70 80 90 100 110 120 

r^^^ 

orflOa YVREYYAAADKDTLFKTLFLPPLLSAAAIAALI.LSRPSLPSEILFSLDDAAAGIGLVLFE 
40 70 80 90 100 110 120 

130 140 150 160 170 180 

otflO . pep LSFLPIRFLLLVIJMEGRAIAFSSAQLVPKI^ 

ii mini iii illinium m i iiiuu 1 1 1 1 1 u u 1 1 1 u 1 1 u u 1 1 1 

45 orfl0 a [sFLPIRFLLLVLIWEGRAWFSSAQLVSKLAILLLLPLTVGIXHFPANTAVLTAVYALA 

HJ 130 140 150 160 170 180 

190 200 210 220 230 240 

orflO pep NlAAAAFLLFQNRCRU^VRHAPFSPAVIJiRGXRYGIPIALSSIAYWGLASADRLFliKiW 
ortiu.pep | | | | | i | l I I I I I % I 1 1 1 I I I I I 1 1 I I 

50 orfl0a NLAAAAFLLFQNRCR^^ 

190 200 210 220 230 240 

250 260 270 280 290 300 

55 otflO pep AGLEQIXjVYSMGISFGGAALLFQSIFSTWTPYIFRAIEENAPPARLSATAESAAALLAS 

53 orflO.pep || 1 1 1 II II I 1 1 II 1 1 1 1 II II II II 1 1 1 1 1 U II 1 1 1 I "IIUUIIIUIU II 

orflOa AGLEQLGVYSMG I S FGGAALLFQS I FSTVWT P YI FRA1 EAN APPARLS ATAE SAAALLAS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orflO pep ALCXTGIFSPIASLLLPENYAAVRFIWSC^XPPLFCTLAEISGIGLNVVRKTRPIALAT 
IN I I I I 1 I 1 | l | t ] ! I I I I ( I I I 1 t I 1 I I 1 M | t I I : I I I M i I I I I I | t I I I I I I I 
orflOa ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 419 

orflO.pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
1 I I I I I I I I I I I I Mi: I I I I I I I I I I M I I : I I 1 1 I I I I I I I I I I I I I I 1:1 I 
orflOa LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 



15 



420 430 440 450 460 470 

orf 10 . pep L FC LT S S AA YT C FGT PAN Y PLFAG VWAA YLAGC I LRHRK DLHK L FH YLKKQG FPLX 
I I I I : I I I I I I I I I I I I I II I I I I I I I • I 1 M 1 I I I I 1 I ! I I I I I I i I I ! I I I 1 I 1 
orf 1 0a LFCLASSAAYTC FGT PAN YPLFAGVWAVYLAGC I LRHRKDLHKLFH YLKKQG FPLX 

420 430 440 450 460 470 



The complete length ORFlOa nucleotide sequence <SEQ ED 377> is: 



20 



25 



30 



35 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



ATGGACACAA 
GGTTTTAGCC 
ACGACATCGG 
TCGGTGTTGT 
CGCCGCCGAC 
TGTCTGCCGC 
TCTGAAATCC 
GCTGTTTGAA 
GTATGGAAGG 
CTCGCCATCC 
GGCGAACACC 
CCGCCGCCTT 
CGCGCACCGT 
ACCGATCGCA 
GTTTGTTCCT 
ATGGGTATTT 
AACGGTCTGG 
CCGCCCGCCT 
GCCCTCTGCC 
GGAAAACTAC 
CGCTGTTTTG 
CGAAAAACAC 
CCTGCTGCTG 
CGGTTGCCTG 
AGCTCCTGCC 
CACATTGTTC 
CGGCAAACTA 
TGCATCCTGC 
AAAACAAGGT 



AAGAAATCCT 
GTCATCATCC 
ACGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 
ACGCGCCCTT 
TGCTGCTGCT 
GCCGTCCTGA 
TTTGCTGTTT 
TTTCATCCGC 
CTAAGCAGCA 
GAAAAAATAT 
CGTTCGGCGG 
ACACCGTATA 
CTCGGCAACG 
TGACCGGCAT 
GCCGCCGTCC 
CACGCTGGTA 
GCCCGATCGC 
CTGGGGCTTG 
TGCCGCCTCA 
GCCTGTGGCA 
TGCCTGGCCT 
CCCCCTGTTT 
GCCACCGGAA 
TTCCCATTAT 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 
GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAT 
TCGCCTATTG 
GCCGGCCTAG 
AGCGGCATTA 
TTTTCCGCGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTATCGT 
GAAATCAGCG 
GCTCGCCACC 
CCGTACCGTC 
TTTTGGCTGT 
GCCGCTCAAA 
CCTCGGCGGC 
GCCGGCGTAT 
AGATTTGCAC 
GA 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 
CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCTGAA 
CGCGGCCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGTTCCAAA 
AATCGAAGCA 
CCGCCGCCCT 
CTCGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCGCG 
TTTTTGTTTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGTATA 
AAACTGTTTC 



TCGGCAGCGC 
TTCCCTGCCG 
GCTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
ATCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
CGTGTCCAAG 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCCCGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGCCTC 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACCGAA 
TTTATATGCA 
TTCGGCACTC 
TCTGGCAGGC 
ATTATTTGAA 



This encodes a protein having amino acid sequence <SEQ ID 378>: 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDTKEILGYA 
SVLCLGLDQA 
SEILFSLDDA 
LAILLLLPLT 
RAPFSSAVLH 
MGISFGGAAL 
ALCLTGIFSP 
RKTRPIALAT 
SSCRLWQPLK 
CILRHRKDLH 



AGSIGSAVLA 
YVREYYAAAD 
AAGIGLVLFE 
VGLLHFPANT 
RGLRYGIPIA 
LFQSIFSTVW 
LASLLLPENY 
LGALAANLLL 
RLPLYMHTLF 
KLFH YLKKQG 



VIILPLLSWY 
KDTLFKTLFL 
LSFLPIRFLL 
AVLTAVYALA 
LSSIAYWGLA 
TPYIFRAIEA 
AAVRFIWSC 
LGLAVPSGGA 
CLASS AAYTC 
FPL* 



FPADDIGRIV 
PPLLSAAAIA 
LVLRMEGRAL 
NLAAAAFLLF 
SADRLFLKKY 
NAPPARLSAT 
MLPPLFCTLV 
RGAAVACAAS 
FGTPANYPLF 



LMOTAAGLTV 
ALLLSRPSLP 
AFSSAQLVSK 
QNRCRLKAVR 
AGLEQLGVYS 
AESAAALLAS 
EISGIGLNW 
FWLFFVFKTE 
AGVWAVYLAG 



60 



65 



ORFlOa and ORF10-1 show 95.4% identity in 475 aa overlap: 

10 20 30 40 50 60 

or f 10-1 . pep MDTKE I LX Y AAGS I GSAVLAVI I LPLLSWYFP ADD IGRIVLMQTAAG LTV SVLCLGLDQA 

Mini; 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orflOa MDTKE I LGYAAG SI GSAVLAVI I LPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 



BNSOOaO: <WO 992457BA2. 1. > 



10 



15 



20 



25 



30 



35 



40 



WO 99/24578 

orf 10-1- pep 
orflOa 

orf 10-1- pep 
orflOa 

orflO-l.pep 
orflOa 

orf 10-1. pep 
orflOa 

orflO-l.pep 
orflOa 

orf 10-1 .pep 
orflOa 

orf 10-1. pep 
orflOa 



PCT/IB98/01665 



-240- 



70 80 90 100 110 120 

Mil II It M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | I 1 1 1 I 1 1 1 1 1 1 1 I 1 1 I I 1 I 1 1 I 1 1 1 IN* I II 

70 80 90 100 110 AZU 

130 140 150 160 170 180 

LS FLPI RFLLLVLRMEGRALAFSSAQLV PKLAI LLLXPLT VG LLH FPANTAV LTAVYALA 

14 0 150 160 1*70 iyu 



130 



230 



240 



190 200 210 220 

NLAAAAFLLFQNRCRLKAVRHAP FS PAVLHRGXR YG I P IAL SS IAYWGLASADRLFLKKY 
I mil I III III III I II I: I IN I 1 I 1 I I IIIIIIIIIIMIIIIIIM1IIIIII 
nl^fi1tonrciu,kavrrapfssavlh^ 

200 210 220 230 240 



190 



300 



250 260 270 280 290 

agleolgvysmgisfggaallfqsifstvwtpyifraieenapparlsataesaaallas 

immimmmmimmmmmmmmmiimm in u n 1 1 1 1 1 m i mi i i 
agleqlgwsmgisfggaallfqsifstvwtpyifraieanapparlsat^ 

260 270 280 290 300 



250 



360 



310 320 330 340 350 

alcxtgifspiaslllpenyaavrfivvscmxpplfctlaeisgiglnvvrktrpialat 
Ml IIMIMMMMMMMMMMIl I 1 1 M M : I 11 M I II I M I M M I M I 

^cltgifspiaslllpenyaavrfiwscmlp^ 

330 340 350 360 



310 



320 



419 



370 380 390 400 410 

LGALAANLLIJX3L0RAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

T7MMI I I M 1 1 Ml: I M I I M I M M M : M M I I I M M II M I I M M I 
L^y^LAANLLLLGL-- AVPSGGARGAAVACAAS FWLFFVFKTE S SCRLWQPLKRLPLYMHT 
370 380 390 400 410 

470 



420 430 440 450 460 

LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 

I II Ml Mill M II III M MM M I Ml MM M M I P 1 



t I I I • I 1 I I I j I I I I I I M I I I I I l I I : i i i 1 1 i m i ii i i i i i i M II I M II M 
LFCLAS SAAYTCFGTPANYPLFAGVWAVYLAGC I LRHRKDLHKLFHYLKKQGFFLX 
430 440 450 460 470 



420 



Homology with a predicted ORF from gonorrhoeae 

ORF10 shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from N. 



45 gonorrhoeae: 



50 



55 



60 



65 



orflOng.pep 
orf lOnm 
orf lOng.pep 



orf lOnm 



orflOng.pep 
orf lOnm 



60 



60 



MDTKEILGYAAGSIGSAVIAVIILPIXSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I I I I I M I M M I M I II I M I M I II M M II M M M M 1 M M M M II M M III 
MDT^ILXY^G^ 

YTOEYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

MM Ml- Ml Ml I I MM MM : M M M M M M I I I M II M I II M II II M M 
YVREYYATMKDTLEOTLFLPPLLS 1 



LSFLPIRFLLLVLRMEGRALAFSSAQLVPKIAILLLLPLTVGLLHFPANTSVLTAVYALA 
i | I i M | M I M M 1 1 II II M I M II I II I M M I I I I I M I I M II I : M M M I M 
LSnPIRm-LV^ 



180 



180 



240 



orflOng.pep NLAAAAFLLFQNRCRLKAVRRAP FS PAVLHRGLRYG I P**^* ^ ® **^m ? V^f^n *n m**mT 
1 1 I II II II II II II I M II J 1 1 M I M M M M I M M M I : I I II M I II I M M I I 



orf lOnm 



300 



orf lOng .pep AGIXQI^YSMGISFGGAALLWSIFST^ 

g P P j I I 1 1 I 1 1 1 1 1 1 1 1 1 j I j 1 1 j : I I I I I I I I M II I M 1 1 M I I Mil I M MMIl II 

AG^QLG^SMGISFGGAALLFQSIFSTVWTPYIFRAIEEN^^ 300 

ALCLTGIFSPIASLIJ*PENYAAVRFTWSCMLPPLFYTLTEISGIGLNVVRKTRPIAiAT 360 
| M M II M I 1 1 M II I II M IMM II II 1 1 M M I I II I II M II I I II I I 



orf lOnm 
orflOng.pep 



I I 1 



orf lOnm 



ALCXTGIFSPLASLUPENYAAVRFIWSC^PPLFCTU^ 



360 



BNSOOCaO. <WO_9Se*57BA2_L> 



WO 99/24578 



-241- 



PCT/IB98/01665 



370 380 390 400 410 

orf lOng . pep LGALAANLLLLGL — AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
tltlllll!l||| Mi: I lit Ullli;! It: Ml || II ! ! I I MiMI II: I I 
5 orflOnm LGALAANLLLI^LDRAVPAR-PXGAAVACAASFWLFFAFTCTESSCRLWQPLKRLPLYLHT 

370 380 390 400 410 

420 430 440 450 46O 470 

orf lOng . pep LFC1ASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

10 I I I i : t I I M I I I I 1 I I I I I | I I I I I I (! I I I I I I i I I I : I I M I I I I I I I I I I I 1 

orflOron LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 



15 



20 



25 



30 



35 



40 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



ATGGACACAA 
GGTTTTAGCC 
ACGACATCGG 
TCGGTATTGT 
CGCCGCCGAC 
TGTTTTCCGC 
TCTGAAATCC 
GCTGTTTGAA 
GTATGGAAGG 
CTCGCCATTC 
GGCGAACACC 
CCGCCGCCTT 
CGCGCGCCGT 
ACCGCTCGCA 
GTTTGTTCCT 
ATGGGTATTT 
AACGGTCTGG 
CCGCCCGCCT 
GCCCTCTGCC 
GGAAAACTAC 
cgctGTTTTA 
CGCAAAACGC 
CCTGCTGCTG 
CGGTTGCCTG 
AGCTCCTGCC 
CACATTGTTC 
CGGCAAACTA 
TGCATCCTGC 
AAAACAAGGT 



AAGAAATCCT 
GTCATCATCC 
GCGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 
GCGCGCCCTT 
TGCTGCTGTT 
TCCGTCCTGA 
TTTGCTGTTT 
TTTCGCCCGC 
CTGAGCAGCC 
GAAAAAATAT 
CGTTCGGCGG 
ACACCGTATA 
CTCGGCAACG 
TGACCGGAAT 
GCCGCCGTCC 
CACGCTGACC 
GTCCGATCGC 
CTGGGGCTTG 
TGCCGCCTCA 
GCCTGTGGCA 
TGCCTgGCCT 
CCCcctgttt 
GCCACCGGAA 
TTCCCATTAT 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 
GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAC 
TTGCCTATTG 
GCGGGCCTGG 
GGCGGCATTA 
TTTTCCGTGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTACCGT 
GAAATCAGCG 
GCTTGCCACC 
CCGTACCGTC 
TTCTGGTTGT 
GCCGCTCAAA 
CCTCGGCGGC 
gccggcgtAT 
AAATTTGCAC 
GA 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 
CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCTGAA 
CGGGGGCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGCTCCAAA 
AATCGAAGAA 
CCGCCGCCCT 
CTCGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCACG 
TTTTTGTTTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGCATA 
AAACTGTTTC 



TCGGCAGCGC 
TTCcccgCCG 
ACTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
GTCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
CGTGCCCAAA 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCACGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGccgc 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACAGAA 
TTTATATGCA 
TTCGGCACAC 
TCTGGCAGGC 
ATTATTTGAA 



This encodes a protein having amino acid sequence <SEQ ID 380>: 



45 



50 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDTKEILGYA AGSIGSAVLA VIILPLLSWY 



SVLCLGLDQA 
SEILFSLDDA 
IAIL LLLPLT 
RAPFSPAVLH 
MGISFGGAAL 
ALCLTGIFSP 
RKTRPIALAT 



YVREYYAAAD 
AAGIGLVLFE 



KDTLFKTLFL 
LSFLPIRFLL 



FPADDIGRIV 
PPLLFSAAIA 



LMQTAAGLTV 
ALLLSRPSLP 



VGLLHFPANT 
RGLRYGIPLA 
LLQSIFSTVW 
LASLLLPENY 
LGALAANLLL 



SSCRLWQPLK 
CILRHRKNLH 



SVLTAVYALA 
LSSLAYWGLA 
TPYIFRAIEE 
AAVRFTWSC 
LGLAVPSGGT 



RLPLYMHTLF 
KLFHYLKKQG 



CLASS AAYTC 
FPL* 



LVLRMEGRAL 
NLAAAAFLLF 
SADRLFLKKY 
NATPARLSAT 
MLPPLFYTLT 
RGAAVACAAS 
FGTPANYPLF 



AFSSAQLVPK 

QNRCRLKAVR 
AGLEQLGVYS 
AESAAALLAS 
EISGIGLNW 
FWLFFVFKTE 
AGVWAAYLAG 



ORFlOng and ORF10-1 show 96.4% identity in 473 aa overlap: 



55 



60 



65 



10 20 30 40 50 60 

orf 10-1. pep MDTKE I LGYAAGSIGSAVLAVI I LPLLSWYFPADDIGRI VLMQTAAGLTVSVLCLGLDQA 
I I I I I! I I I I I I I I I I 1 I I I I I 1 I I I I I I I I II I I I I I I I! I I I If I I I I I I I I I i I I I I 
orflOng-1 MDTKEI LGYAAGSIGSAVLAVI I LPLLSWYFPADDIGRI VLMQTAAGLTVSVLCLGLDQA 
10 • 20 30 40 50 60 

70 80 90 100 110 120 

orf 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
1 1 I I I I (: 1 1 I I I i I I I I I I I 1 I I = J I 1 1 1 I 1 I I I I I 1 I I J I 1 I I I I I I I I | 1 | | | | t I 
orflOng-1 YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
70 80 90 100 110 120 



BNSOOCJD <WO 992*578*2 J_ > 



WO 99/24578 



-242- 



PCT/IB98/01665 



130 140 I 50 170 180 



orf 10-1. pep 
orf 10ng-l 

orf 10-1 .pep 



190 200 210 220 230 240 

250 260 270 280 290 300 



orflOng-1 AGI 



250 260 270 280 

AGIXQl^SMGISFGGAALLFQSIFSTVOTPYIFRAIEENAPPAR^ATAESAAALIAS 

I I II I II I I M I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I ' I I M I I I I 1 1 1 I I 1 1 I I 
iLEQLGVYSMGI SrcGAALLLQSI FSTVWTPYI FRAIEENAT PARLSATAESAAALLAS 
9«n 270 280 290 300 



250 260 270 280 

310 320 330 340 350 360 

orflO-1 oeo W,CLTGIFSPIASLLLPENYAAVRFIWSCMLPPLFCT1^ISGIGLNV^RP1A^T 
orf 10 1. pep Ai.Cl.Tta Milium II : I I I I I I I I I I I I I I III I I 

orfl0ng-l W^LTGIFSPI^IiLPEOTAAVRFTVVSCMLPPLFYTL^ 

370 380 390 400 410 420 

orfl0-l Pep I^AIAANLLLLGLAVPSGGARGMVACAASFWLFFAFKTESSCRLWQPL^LPLYLHTLI" 
orf 10 1 .pep , . 1 1 1 1 : 1 1 1 1 1 I I I I I I I 1 1 I : I I I I I I I I I I I I I I I 1 1 1 1:1 1 1 1 



310 320 330 340 350 

370 380 390 400 410 

w )>vi jWILLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKI 

■ VTmii I I I I I I I III I I : I III I I I I I I III I I : I I I I I I I I I I I I I I I I I I "J, ' ' ' 1 

orfl0no-l IGMjAANLLlLsLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQP^ 
0 y 370 380 390 400 410 420 

430 440 450 460 470 

orf 10-1 . pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCIIAHRKD^FHY^QGFPW 
P F I I . , 1 I 1 I 1 1 I 1 I I m I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I I I I I I I 1 
orflOna-1 CliisAAYTCFGTPANYPLFAGWAAYIAGCXUlHRKNI^LFHYLKKQGFPLX 
A-*r\ aaci 4S0 460 470 



430 440 450 460 

Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and ^.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 45 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 381>: 

1 ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 " CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

201 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGC 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

401 GCAaCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

451 AA AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 

1 ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
si "eveekagepe REEPDGQAVR KKALTEEREQ tvrekaqkkd AETVKIQAVK 



9924578A2J > 



WO 99/24578 



-243- 



PCT/IB98/01665 



101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTATGA 
C7TCGGTTTG 
TGAACCAGAG 
CCTGCAGAAA 
CCAACCTGAA 
CAGAGGCAGA 
GCCGATAAAG 
AGAGCCGGAC 
AACAAACCGT 
AAACAAGCGG 
AGAGAAAAAG 
AAATCCT CAA 
GAAGTGCAGA 
GCAAATGGGC 
AACTGGCAAT 
CATAAAACGC 
GAAAAAAATG 
GTTCTATCGA 



ACAAATTTTC 
ATACTGGCGA 
CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
CCGACGAGGT 
GGACAGGCAG 
CAGGGAAAAA 
TAAAACCGTC 
GCGGCGAAGG 
CAGCGGCAGC 
AAATGAAAAC 
GCGTATGCCG 
CTTGGGCATA 
TTTACCGGGT 
CAGGACGAGT 
AAGCAAATAA 



CCAATCCGGA 
CGGTCATTAT 
GCGTTCAAAA 
GAAACCGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAAGAAAAG 
TGCGTAAGAA 
GCGCAGAAGA 
TAAAGAAACA 
AAAAAGTTGC 
ATCGAAAAAG 
GTCCGACAAG 
ACCGTCAGAG 
TCTTCCAAGG 
GCAAAGCGGC 
TGAAAAAACA 



AAAGGTCTGT 
TGCCGGTATT 
TCCCGGCTTC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GCGGGCGAGC 
AGCGCTGACG 
AAGATGCCGA 
GAGAAAAAAG 
ACCCAAACCA 
CGCGCAGTGC 
GCGGAAGCAA 
CGCGGAAGGG 
TGGTCGGTTA 
AATATGTCTG 
TGAAGTCGCC 



CCGGTTTTTT 
TTGTTTTATC 
GTCGAAGCAG 
AGGAAGACAT 
GATGCTGCGA 
GCAGCCCGTT 
CGGAACGGGA 
GAAGAGCGTG 
AACGGTTAAA 
CTTCAAAAGA 
ACCCCGGAAC 
CGCCGCCAAA 
CGCATTATCT 
CAGCGTGCCA 
TCAGGCGGGA 
CCGATGCGGT 
AGCCTGATCC 



This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 



25 



l 

51 
101 
151 
201 
251 



MFMNKFSQSG 
PAETEILKPK 
ADKADEVEEK 
KQAVKPSKET 
EVQKMKTSDK 
HKTLYRVQSG 



KGLSGFFFGL ILATVIIAGI 



NQPKEDIQPE 
AGEPEREEPD 
EKKASKEEKK 
AEATHYLQMG 
NMSADAVKKM 



PADQNALSEP 
GQAVRKKALT 
AAKEKVAPKP 
AYADRQSAEG 
QDELKKHEVA 



LFYLNQSGQN 
DAATEAEQSD 
EEREQTVREK 
TPEQILKSGS 
QRAKLAILGI 
SLIRSIESK* 



AFKIPASSKQ 
AEKAADKQPV 
AQKKDAETVK 
IEKARSAAAK 
SSKWGYQAG 



30 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of N. 
meningitidis: 



35 



40 



45 



50 



55 



10 20 30 

orf 65 .pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

1111:11 I I I I II : I I ! I 11 I I I I ! I I I 
orf 65a IIAGILF YLNQSGQNAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADQNALSEPDAAKE 
30 40 50 60 70 80 

40 50 60 70 80 90 

or f 65 . pep AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
II I M I I : I I I I I I II I t I I II I I I I Mill: I I I I I I I I I I I I I I i I I I I I I I I I I 
orf 65a AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 
90 100 110 120 130 140 

100 110 120 130 140 150 

or f 65 . pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQI LNSGS I EXAR SAAAKEVQKM 
Mill M II II I I 1 1 M I I II II M I I M I M M I I II II I M I I II I I I I M I I i I 
orf 65a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQI LNSGS IEKARSAAAKEVQKM 

150 160 170 180 190 200 

160 170 180 190 200 210 

orf 65. pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

orf 65a KT PDKAEATH YLQMGAYADRRS AEGQRAKLAI LG I S S KWGYQAGHKT LYRVQSGNMS AD 

210 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ED 385> is: 



i 

51 



ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 
CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 



BNSDOCID: <WO 992*S78A2J_ 



WO 99/24578 



PCT/IB98/0I665 

-244- 



101 

151 
201 
251 

5 301 
351 
401 
451 
501 

10 551 



TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGTTCC GTCGAAGCAG 
CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 
CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 
AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 
GCCGACAAAG CCGACGAGGT TGAGGAAAAG GCGGACGAGC CGGAGCGGGA 
AAAGTCGGAC GGACAGGCAG TGCGCAAGAA AGCACTGACG GAAGAGCGTG 
AACAAACCGT CGGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 
AAACAAGCGG TAAAACCATC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 
AGAGAAAAAG GCGGAGAAGG AAAAAGTTGC ACCCAAACCG ACCCCGGAAC 
551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCTGCCAAA 
601 GAAGTGCAGA AAATGAAAAC GCCCGACAAG GCGGAAGCAA CGCATTATCT 
651 GCAAATGGGC GCGTATGCCG ACCGCCGGAG CGCGGAAGGG CAGCGTGCCA 
701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 
7 Si CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 
15 eoi GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 386>: 

1 MFMNKFSQSG KKLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK KQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

OA 101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

ZU III KOAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 

0 c 10 20 30 40 50 60 

orf6Sa pep mfwkfsqsgkglsgfffgliiatviiagilfylnqsgqnafkipvpskqpaetei^pk 
orf osa.pep n*™ Y| M I M 1 1 1 II 1 1 1 M 1 1 I I I I I I II M II 1 1 M I 1 1 : M M M M M I M 
orf65-l m^kfsqsgkglsgfffgliiatviiagilfylnqsc^nafkipasskqpaeteiucpk 

10 20 30 40 50 60 



30 



35 



70 80 90 100 110 120 

orf 65a pep NQPKEDIQPEPADQNALSEPDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 
orf 65a. pep 7°^ux« w I I I I I I 1 I 1 1 I I I 1 1 1 I I I 1 1 1 1 1 I I I I MM: I 



Ml I I M Ml Mil Ht II Ml I MMIMMMIMMMMIMMM 
orf 65-1 NQP^DIQPEPADQNALSEPDAATC^ 



130 140 150 160 170 180 

40 orf 65-1 gqavrkKALTEEREQTVREKAOKKDAETVKKQAW 

- - 14Q 150 16Q 170 180 

190 200 210 220 230 240 

orf 65a pep TPEQI LNSGSIEKARSAAAKEVQKMKT PDKAEATHYLQMGAYADRRS AEGQRAKLAI LGI 
45 orf65a. P e P *™ , , , , , , , , , , , , , , , , , , , , , , I I M M M M I M I I 1 1 : M 1 1 I I I I I M M I 

orf 65-1 tpeqilnsgsiekarsaaakevqkmktsdkaeathylqmgayadrqsaegqrakiailg^ 

190 200 210 220 230 240 

250 260 270 280 290 

<0 orf65a pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

orrwa.pep ( j | | j | j j | j j | m H I 1 1 1 1 1 1 1 M I I I H I 1 M II M M I 1 1 M I I I I 
orf 65-1 SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
250 260 270 280 290 

55 Homology with a predicte d ORF from N. gonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from N. 

gonorrhoeae: 

30 40 50 60 70 80 

ORF65ng IIAGILLYI^QGGQNAFKIPAP^ | 

60 — — 



0RF65 



ILKPHNQLKEDIQPDPADQNALSEPDAATE 
10 20 30 
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90 100 no 120 130 140 

ORF65ng AEQS DAEKAADKQP VADKADE VEEKAGE PEREEPDGQAVRKKALTEEREQT VREKAQKKD 

I I I I i I | : I I I I I I I M I I I I U I I I I I I I I I I I I 1 I I I i I ! I | | | | M | I I t I II I I I I 
ORF65 AEQSDAENMDKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
40 50 60 70 80 90 

150 160 170 180 190 200 

ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 

I 1 I I I : | t I I I I I I I I f I t I I I I | M | | | | 1 I I I I I I I I I I I III II I I I II I I I U 
ORF65 AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
100 110 120 130 140 150 

210 220 230 240 250 260 

ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 

I lilt I I I I I I I M I : I I I I I I I M I I 1 I I I I : I I : I I I I I I i I I : I I M I I I 
ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 

ORF65ng MR 
I I 

ORF65 MR 

An ORF65ng nucleotide sequence <SEQ ED 387> was predicted to encode a protein having amino 
acid sequence <SEQ ID 388>: 



1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 

After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTCTT 

51 CTTCGGTTTG ATACTGGCAA CGGTCATTAT TGCCGGTATT TTGCTTTATC 

101 TGAACCAGGG CGGTCAAAAT GCGTTCAAAA TCCCGGCTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACTGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGTTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAag ccgacgAGGT TGAAGAAAag GcGGgcgAgc cggaACGGga 

351 aGAGCCGGAC ggACAGGCAG TGCGCAAGAA AGCACTGAcg gAAGAgcGTG 

401 AACAAACcgt cagggAAAAA GCGCagaaga AAGATGCCGA AACGgTTAAA 

4 51 AAacaaGCgg tAaaaccgtc tAAAGAAACa gagaaaaaag cTtcaaaaga 

501 agagaaaaag gcggcgaaag aaaAAGttgc acccaaaccg accccggaaC 

551 aaatcctcaa cagccgCagc atcgaaaaag cgcgtagtgc cgctgccaaa 

601 gaAgtgcaGA AAatgaaaaa ctTtgggcaa ggcgGaagcc aacgcattaT 

651 CTGcaaatgg gcgcgtatgc cgaccgtccg gagcgcggaA gggcagcgtg 

701 ccaaACtggc aAtcttgGgc atatctTccg aagtggtcgG CTATCAGGCG 

751 GGACATAAAA CGCTTTACCG CGTGCAAagc GGCAatatgt ccgccgatgc 

801 gGTGAAAAAA ATGCAGGACG AGTTGAAAAA GCATGGGGtt gcCAGCCTGA 

851 TCCGTGcgAT TGAAGGCAAA TAA 

This encodes the following amino acid sequence <SEQ ED 390>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LL YLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPTVRSAE GQRAKLAILG ISSEWGYQA 

251 GHKTLYRVQS GNMSADAVKK MQDELKKHGV ASLIRAIEGK * 

ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 



10 20 30 40 50 60 

orf 65-1. pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 
I ! I I I I II I I I I I I II I I I I I I I I II I I I I I : I I I I : I I I I I I I I I 1 I I | I 1 I | I f | | 
orf65ng-l MFMNKFSQSGKGLSGF FFGLILATVIIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 

10 20 30 40 50 60 



PCTAB98/01665 

WO 99/24578 

-246- 

70 80 9° 100 110 120 

orf6S-l Deo NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

orf 65-1 .pep NUr 1111111111:1 1 1 1 1 1 1 1 I 1 1 1 I I I I I I I I I I I I I I I ' iililiiii 

orf65ng-l ia*™*^^ 

130 140 150 160 170 180 

or£65-l oeo GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAA^WAPKP 
orf 65-1 .pep <*ja , , , , , , , , , , , , , ,, , , , , , , , , 

orf65ng-l wiraJiii*^ 

* 130 140 150 160 1 IV AOU 

190 200 210 220 230 239 

orf 65-1 . pep TPEQI LN SGS IEKARSAAAKEVQKMKTS DKAEATHY L-QMGAYADRQSMGQR^l^LG 
° Ffc ^ ■ i I I I I 1 1 I I 1 1 1 1 I | | ; :: : : : : : : :| I MINI I MM 

orf65ng-l ^PEQILNSRSIEKARSAAAKEVQKb^ 

ion 200 210 220 



190 

240 250 260 nv 

«rf65-l oeo ISSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
orf 65 1. pep * ^ | M I M M M M M M M I H M I I ! M M M 1 M M M : H : M 
orf65na-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 
ot y 250 260 270 280 290 

On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins fiom ^meningitidis and N.gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 46 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
391>: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTJcTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGa . s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CI6MCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 
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351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 



This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 

10 



- mmhDI TFLTL FLLGFFGGTH rtGMCGGLSS AFALQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP^AVG^ILWCWL^ 

151 VYSASL YALG SGSAATGGLY M TJVFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Computer analysis of this amino acid sequence gave the following results: 
15 Homology with a predicted ORF from H meningitidis (strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A of N. 

meningitidis: 

10 20 30 40 50 60. 

orfl03 pep MKHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 
20 orfl03.pep mnh ||||||Jtllt II I III I I I I I I I I I I I I I II I 1 1 I 1 1 Ml I 

O-fl03a ^XDITFLkFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

75 orfl03 pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFI/3LYLSGIS 

2b orfl03.pep Ml I I I I III H I I III I I I I I I I I I I I I I I I I I M M M I ! I I I I I 

orfl03a GLILGLIGQVGVSLDQTRVXQ^^ 

70 80 90 100 110 120 

™ 130 140 150 160 170 180 

o-fl03 Pep NPILNRLLPIKSI PACLAVG I LWGW L PCG L VY S AS LY ALG SG S AATGG L YMLAF A LGTL P 
o.fl03. P e P | | | | | | 1 | | | I I I I I I I I i I I 1 I t I 1 I I I I I I I H I I I I I I I 1 I I IN III I 

orfl03a npiwwlpiksipmiavgil^ 

130 140 150 160 170 180 

35 190 200 210 220 

orfl03 pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
II 1 1 II I I II 1 I 1 1 M 1 1 M 1 1 1 M M 1 I M 1 M M I M I I I 

orf ^ 03a NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
40 190 200 210 220 

The complete length ORF103a nucleotide sequence <SEQ ID 395> is: 

1 ATGAACCANG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTNT GGCTGATCCT GCTGCTTAAC 

45 151 ACAGGACGGG TAAGCAGCTA 7ACGGCAATC GGCCTGATAC TCGGATTAAT 

4J 201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCNTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

30 1 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

50 401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTA 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTNGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGNAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

55 651 TGCCGTCCTG TGGCTGTAA 

This encodes a protein having amino acid sequence <SEQ ID 396>: 

! MM YnTTrr.TI. FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 
51 TfMVgSYTAT GLILGLIGQV GVSL DQTRVX QNILYTAANL LLLFLGLYLS 
10 1 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 
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15 



20 



25 



151 VYSASLYALG SGSAATGGLY MTAFALGTLP NLXAIGIF ST, QLXKIMQNRY 
201 IRLCTGLSVS LWALWKLAVL WL* 

ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

orflC3a pep MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
M I I I I I 1 I I I 1 I I I ] I I 1 I I 1 i 1 I I I 1 1 1 1 I I 1 I I I I I I 1 I I I I I 1 M I 1 1 1 I I H I 
or f 103-1 MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSTTAI 

10 20 30 40 50 60 

70 80 90 100 .110 120 

orf 103a . pep GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
tlMIIMMitltltlll I I I I I I f 1 1 1 1 1 I 1 I I I I I 1 t I I I I 1 t 1 I I I ! 1 1 I I I 1 1 I 
orf 103-1 GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103a. pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I | | | | M I I I I I I | M II I I I I I I I I II I I I I I I I I I M I M I I I I II I 11 I I I I I I I I I 
orf 103-1 NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103a . pep NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 
orf 103-1 NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 



Homology with a predicted ORF from N gonorrhoeae 

ORF103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from N. 



30 gonorrhoeae: 



35 



40 



45 



orfl03.pep 
orfl03ng 
orfl03.pep 
orfl03ng 
orfl03.pep 
orf 103ng 
orf 10 3. pep 
orfl03ng 



MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 

I | | | | II | | II I I I I I I I I I I I I I I 11 I I I I I I I I I II I I I I I I I II I I I : I I I I I I 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 
I I : || I II I : I : I I I II I I I I I I I I I I: I I I 1 I I 1 I I I I II I II I I I I M I I I I I I I I I I 
GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 
MINIM III III Ml III II I I I I I I I I 1 U 1 I Mill 1111:11111 IIIMN III 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 
I I I I Ml I I II I I M I I M II I I I I I I I I I I I I I M II M I 
NLLAI G I FSLQLKKIMQNRYIRLCTGLSVS LWALWKLAVLWL 222 



The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCACG 
CGGAACTCAC 
TCCAACTCCC 
ACAGGACGGA 
CGGACAACTC 

tatacacagc 
GGTATTTCTT 
GCGCAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGACTGTAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCATTTCAC 
ctccaaCCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCTGTCGGA 
CATCACTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACAGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTTT 
TACGGCAATC 
TCGACCAAAc 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGAAAA 
ATCCGTATCA 



TTCCTGCTCG 
ATTAAGCAGC 
GGCTGATTCT 
GGCCTGATGC 
ccgcgTCCTG 
TTTTAGGCTT 
AAAATCGGCA 
GCTGCCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAATCATGCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAAAATATTT 
ATACTTGAGC 
AACCGATATG 
AAATCCATAC 
GTGCGGACTG 
CGACAACCGG 
AATCTTTTGG 
AAACCGATAT 
TATGGAAGCT 



This encodes a protein having amino acid sequence <SEQ ID 398>: 
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i MNHDITFLTL FLLGFFGgTH CIGMCGGLSS &r&T.nLPPHl NRFfcLlLLLN 

51 f GRISSY TAI Ga LG^^G^LDOTRVL QNILYTASNL LLLFLGLYLS 

ini G ISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY MLAFALGTLP NLLAIGIF SL QLKKIMQNRY 

5 201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORFl03ng and ORF103-1 show 97.3% identity in 222 aa overlap: 

10 20 30 40 50 - 60 

erf 103-1 pep MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 

I I I I I M I I I I I I 1 1 II III I II I 1 1 I I I Hi I I I I 1 1 I I m I II 1 1 I M II : HUM 
10 o-fl03nq MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1 Pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

15 t|:||||||:t:i||llMllllilll:lllllllllllllllllilllllM II 

orfl03na GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
* 70 80 90 100 110 120 

130 140 150 160 170 180 

20 orf 103-1 pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTL? 

' F t( ||tMIMIIIIIIMIIIIIIIilllilllilllMlt)ll:IIMIIMIIH!M 
orflC3na NP i L NRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 
9 130 140 150 160 170 180 

95 190 200 210 220 

orf 103-1 pep NLLAIGIFSLOLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I | I 1 1 j 1 t t I I I 1 I I J I t I I I t I I 1 I 1 I I I I I 1 t 1 1 I I 1 I I I I 
orf!03nq NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

30 Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 47 

35 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 399>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT.TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCGaGGATT 

40 201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

401 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

45 45i GTCGGGTTTG GGCGCGTATG C.AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CGGCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

50 701 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCA3CGC 

751 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

801 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA. . 

This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

55 51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 



BNSOOCID: <WO_ 992*57aAiJ.> 



WO 99/24578 



-250- 
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251 VTTLLPVFTV INTLLGHYVM PETFAAP • • • 

Further work revealed further partial DNA sequence <SEQ ID 401>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

5 101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

10 351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

15 601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

20 This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWC SF RLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAK GV LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

25 201 FLPFAE PAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP. . > 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein of if. influenzae (accession number U32769) 

ORF104 and HI0878 show 40% aa identity in 277aa overlap: 

30 orfl04 4 QRPLI^FRUaiJ^TWGTLPXS^QVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 62 

Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 
HI0878 3 QQPLU3FTFALITAMAWGSLPIALKQ\n,SVMNAQTIVVrYRFIIAAVSLLALLAYKKQLPE 62 

orfl04 63 — KRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 
35 K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 

HI0878 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 

orfl04 121 KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 
K+++ QKI ++FND+F +GL Y GV+L G++ WV +AQKL+ 

40 HI0878 119 KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGU^QYSTGVILGVGGALIWVAYGMAQKLM 178 

orfl04 181 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 
HI0878 179 LRKFN SQQI LLMMYLGCAI AFMPMADFSQVQELT- PLALICFI YCCLNTL IGYGS YAEAL 237 

orfl04 241 KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 

W+ SKVS V TL+P+FT++ + + HY P FAAP 
HI0878 238 NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 274 



45 



50 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

orf 104 . pep MENQRPUiGFRlJUJAAMTWGTLPXSVRQVUC^ 

55 ii in nil! i tin in in n : 1 1 1 1 1 n 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 n n 1 1 n i 

orf 104a MENQRPLIX3FA1ALLAAMTWGTLP IAVRQVLKFV DAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 
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10 



15 



20 



orf 104. pep 
orf!04a 



orf 104. pep 
orfl04a 



orfl04.pep 



orfl04a 



orf 104 .pep 
orfl04a 



LPKRRDFSWCSFRLLLLGVAGI SANFVLIAQGLHYI S PTTTQVI^QIS PFTMI VVGVtVF 
80 90 100 l 1 ^ I 20 



70 



170 



180 



130 140 150 160 

KDRMTAAQKIGLVLUAGLUtfFNDKrc^ 

II | i | 1 1 || . t | | I | | j | I I I : I 1 I I I I I I I 1 1 I I I I I I I H H H I 1 1 M I I I I I I I 
KDMTAAQKIGLVL^^ 

130 140 150 160 170 



180 



230 



240 



190 200 210 220 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 

I \ | | | | || | | | M | 1 | | | I I II I I I I I I I I I : I I I I I I M : I I I I I ! I I I I I I I I I I I I 
SAQFGPQOILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 

250 260 270 

KHWEASKVSAVTTLLPVrrVINTLLGHYVMPETFAAP 
| | | | | | | | 1 | | U | | | I I I I I : I I I I I I I 1 : I H I I 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMKGLGYAGALVVVGGAVTAAVG 
250 260 270 280 290 300 



The complete length ORF104a nucleotide sequence <SEQ ID 403> is: 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGGAAAACC 

GATGACGTGG 

TCGATGCGCC 

TTGTTTGTTT 

TTCTTGGTGC 

CAAACTTTGT 

ACGCAGGTTT 

GTTGGTGTTT 

TGCTGCTTGC 

TCGGGTTTGG 

TATGGCATGG 

TCGGGCCGCA 

TTCCTGCCGT 

GGCGTGGGTT 

GCTCGTTCGG 

GTAACAACCT 

TTATGTGATG 

ATGCCGGCGC 

GACAGGCTGT 



AAAGGCCGCT 

GGAACGCTGC 

GACGCTGGTG 

TGCTGGCATT 

TCATTCAGGC 

GCTGATTGCC 

TGTGGCAGAT 

AAAGACCGGA 

CGGTTTGCTT 

GCGCGTATGC 

GTGTGTTATG 

ACAGATTCTG 

TTGCCGAACT 

TGTTTTGCGT 

CGAGGCGTTG 

TGCTCCCCGT 

CCTGATACTT 

ACTGGTCGTG 

TCAAACGCCG 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
TGACTGCCGC 
ATGTTTTTTA 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 
GTCGGGGGTG 
CTAG 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGT 
CGGCGTGGCG 
ATTATATTTC 
ACGATGATTG 
TCAGAAAATC 
ACGATAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGCAAG 
GGAAGTTTGG 
GAATACGTTA 
AGGCTTCCAA 
ATATTTTCTT 
GGATATGAAC 
CGGTTACGGC 



TTTTGGCGGC 

TTGAAGTTTG 

GGCGGCGGTA 

GGCGGGATTT 

GGCATTTCGG 

GCCGACCACG 

TTGTCGGTGT 

GGCTTGGTTT 

CGGCGAGTTG 

CGGCAGGCAG 

TCGGCGCAAT 

TGCCGCCGTG 

ACGGTACGTT 

ATCGGTTACG 

AGTCAGCGCG 

TGCTCGGGCA 

GGTTTGGGTT 

GGCGGTGGGG 



This encodes a protein having amino acid sequence <SEQ ID 404>: 



45 



50 



l 

51 
101 
151 
201 
251 
301 



MENQRPLLGF ALALLAAMTW 
LFVLLA LGGR LPKWRDFSWC 
T OVLWOISPF TMIWGVLV F 
SGLGAYAKG V LLCAAGSMAW 
FLPFAELAHI GSL DGTLAWV 



GTLPIAVRQV LKFVDAPTLV WVRFTVAAAV 
SF RLLLLGVA GISANEVtlA QGLHYISPTT 
KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 
VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 



VTTLLPVFTV IFSL LGHYVM 
DRLFKRR* 



CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 
PDTFAAPDMN GL GYAGALW VGGAVTAAV G 



ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 



55 



60 



65 



orf 104a. pep 
orfl04-l 

orfl04a.pep 
orfl04-l 

orf 104a .pep 



10 20 30 40 50 60 

MENQRPLI£FA1JU*LAAMTWGTLPIAVRQVLKFVDAPTLVWVRF^ 
1 1 1 1 1 1 1 1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 I I t I I f 1 1 1 I I I I I 1 I I I 11 I I 1 
MENQRPLIXSFAIJUjIAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVL^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

LPKWR D FS WC S FRLL LLG VAG I S AN FVL IAQG LH Y I S PTTTQVLWQI S P FTM I WGVL V F 
HI MIlllMIMIIItllltlltllMIIIIIIMIIIIIIIIIIillMMlllM 
LPKRRDFSWCSFRLLLLGVAGI SAN FVLI AQGLH Y I S PTTTQVLWQI SPFTMIWGVLVF 
70 80 90 100 110 120 

130 140 150 160 170 180 

KDROTAAQKIGLVI^AGUJIFFND^ 
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WO 99/24578 
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10 



15 



20 



25 



30 



35 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 H I M I II I 1 1 1 1 1 1 1 1 1 1 1 1 1 M I M 1 1 1 1 II I I 1 1 M I I 
orf 104-1 KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 104a . pep saqfgpqqillliyaasaavflpfaelahigsldgtlawvcfaycclntligygsfgeal 

llimitllllimilllt IIIIIMIIMIIIMIMIIMIMIIIIH 

orf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGBAL 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 104a . pep KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALVVVGGAVTAAVG 

I t t I I I I I I I I t I 1 1 I I 1 1 1 I I M I It I ! : I I I II 
orf 104-1 KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
250 260 270 

Homology with a predicted ORF from N.%onorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 
gonorrhoeae: 

orf 104 .pep 

orf 104ng 

orf 104 .pep 

orf 104ng 

orf 104 .pep 

orf 104ng 

orf 104 .pep 

orf 104ng 

orf 104 .pep 

orf 104ng 



MENQRPLLGFRIJUjIJ^MTWGTLPXSWQVLKFVDAPTLVWVRFTVAAAVLFVLLiALGGR 60 

I t I I I II I I I II I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

MENQRPLI^FALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR . 60 

LPKRRDFS WCS FRLLLLGVAGI SAN FVLIAQGLH YI S PTTTQVLWQI S PFTMI WGVLVF 120 
I I | I | 1 I I I I I I I I I I I I: I II I I I I I I I I I I I I I I I I I I I I 11 I II I II I I II I I I I ! 

LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYIS PTTTQVLWQI SPFTMIVVGVLVF 120 

KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 
I I M II I I I I I I I I I I: I I I I : I I II I M I I I I I I I I I I I I II I I I I I I II I I I I I I I 

KDRMTAAQK I GLVLLLVGLLMFFN DKFGELSGLGAYAKGVLLCAAGSMAWVC YAVAQKLL 180 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 
I f I I I 1 I I I I I I I I I II I I I I I I I II I I M : I I I 1 I I I 1 I I I I I M I II I II t II I 

SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 240 

KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 
I II I I I I I I I I I I I I 1 I I I I I : I I II I I I I : I I I I I 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 300 



The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
40 protein having amino acid sequence <SEQ ID 406>: 

1 MENORPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWH SF RLLLLGVT GISANFVLIA QGLHYISPTT 

101 T QVLWQISPF TM I WGVLVF KDRMT AAQK I GLVLLLVGLL MFF NPKFGEL 

151 SGLGAYAKGV LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

45 201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAAV G 

301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGCAT 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCttgT 
TCGGGTTTGG 
TATGGCCTGG 
TCGGGCCGCA 
TTCCtgccgT 
GGCGTGGGTT 



AAAGGCCGCT 
GGGACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTttgCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGccgaaCC 
TGTTTTGTGT 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
tgaCTGCCGC 
ATGTTTTtta 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGC 
CGGCGTGACG 
ATTATATTTC 
ACGATGATTG 
GCAGAAAATC 
ACGACAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGcaag 
GGAAGTTTgg 
GAATACGTTA 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGCGT 
GGTTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
tgccgccGTG 
aCGGTACGtt 
ATCGGTTACG 
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701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

In itS cSgatactt ttgccgcgcc ggatatgaac ggtttgggtt 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

1 MENQRPLLGF AL ALLAAMT W GTLPIAVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPt^RDFSW H SFRLLLLGVT GISANFVL IA QGLHYISPTT 

101 TO VLWOISPF TMIWGVLV F KDRMTAAQKT ^LLLVGLL MFFWKFGEL 

151 gr. T^avavr.v T.T. HAAGSMA W VCYAVA QKLL SAQFGPQQIL LLIYAASAAV 

iS hp ™ r W^TIAWV CFVYCC LNTL I GYGSFGEAL KHWEASKVSA 

III ^tlIpv^tv ifsl lghyvm pdtfaapdmn glgyvgalvv VGGAVTAAVG 

301 DRPFKRR* 



ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 



) 



30 



35 



40 



45 



50 



55 



60 



50 



60 



orf 104-1. pep 
orf 104ng-l 



orf 104-1. pep 
orfl04ng-l 



orfl04-l.pep 
orfl04ng-l 



orf 104-1. pep 
orfl04ng-l 

orfl04-l.pep 
orfl04ng-l 



10 20 30 40 

MENQRPLLGFAlJaLAAMTWGTLPIAVRQVIJCFVDAPTLVWVRrrVAAAVLFVLLALGGR 

millllllllllllllMIIMIIUIIIIIIIMMMIIIIIIMIIMIMIIII 
MENQI^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
LPKRRDFSWOb ^ | | 1 1 | 1 | M | | | M ! I 1 M 1 I I 1 1 I I M I I I M I I I I I I 

LP^RDFSWHS^LLLLGVTGISAN^LIAQGLHY 

70 80 90 100 110 120 

130 140 150 160 170 180 

KDRMTAAQKIGLVLLIAGLLMFFNDKFGELSGI^GAYAKGVLLCAAGSMAWVCYAVAQKLL 

I Till It I H I 1 1 1 1 1 = H I M 1 1 1 1 1 1 I » 1 1 1 1 1 1 M 1 1 1 1 1 I I 1 I I I I 1 1 1 1 1 1 ! 1 1 1 

K^T^QKI^ 

130 140 150 160 170 180 



230 



240 



190 200 210 220 

SAOFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCUJTLIGYGSFGEAL 

III II III I I I I 1 1 1 1 I n 1 1 1 1 1 1 1 1 1 1 1 n ■ » « » I r I I I I I I I I 1 1 1 1 1 1 1 1 1 

iiiFGPQQILLLIY^SMVFLPFAEPM 

!90 200 210 220 230 240 



250 260 270 

KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 



In addition, ORF104ng-l shows significant homology with a hypothetical H.influenzae protein: 
gi 1 1573895 (U32769) hypothetical [Haemophilus influenzae] Length = 306 
SZ^l S^^TOiaS^ 168/28C (59*,. Gaps - 8/280 .»> 

Query 30 qrpxxXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLWXXXXXXXXXXXXXXXXXXXXP- 88 

O+P M WG+LPIA++QVL ++A T+VW P 

Sbjct: 3 QQPLLGrrFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

Query: 89 --KRRDFSWHSFiaLLWVTGISANFVLIAQGI^YIS^ 146 

J K R ++w ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
Sbjct: 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIt 118 

Query: 147 KDRMTAAQKIXXXXXXXXX^ 206 

K . OKI +FFND+F +GL Y+ GV+L G++ WV Y +AQJU.+ 

Sbjct: 119 1^KU5LHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 178 

Query: 207 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLWT^ 266 

+F QQILL++Y A F+P A+ + + L LA +CF+YCCLNTLIGYGS+ EAL 
Sbjct: 179 LRKFTJS^ILUlMYlXjCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 237 



65 ' Query: 267 KHWEAS KV SAVTTLLPV FT V IFSLLGHYVM PDTFAAPDMN 306 



<WO_9924S7aA2J_> 
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W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
Sbjct: 238 NRWDVSKVS WITLVPLFT I LFSH I AHYFS PADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
5 N .meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

10 51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG.GTTTTGT 

101 T.TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

15 301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

401 AATGGGTGGA ACGCGTsimnA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

4 51 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

20 551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

25 801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG... 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 

1 MVARRAHNPK WGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

30 51 I FLPAAI SER QTAVCLRLQI QAVWLQSSAL SSRKPTMPTV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

35 301 NEILYVFDAV LP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

40 151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

45 401 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

50 651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

55 This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 MP7VRFTESV 

51 ERVKKDWEAG 

101 ECFDLTDGGG 

151 SPHKAVDPNK 

201 SQLHSLRSVS 

251 DAMLSGNMMH 



SKQDLDALFE 
CSESSDGIFL 
NPLFTLERAA 
LDNTAAGGVS 
RGVHNEILYV 
DAQLVTLDAF 



WAKASYGAES 
NADGWPDMGG 
FRPFGLLSRA 
GGEMPSEAVC 
FDAVLPETFL 
CRYGLIDAAH 



CWKTLYLNGL 
RLQHLALGWH 
VHLNGLTESD 
RESSEEAGLD 
PENQDGEVAG 
PLSEWLDGIR 



PLGNLSPEWV 
CAGLLDGWRN 
GRWHFWIGRR 
KTLLPLIRPV 
FEKMDIGGLL 



251 DAMli^Nrmn DAQLVTLDAF CRi^^- Urtftn rLiotwLDGIR L w 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A of//. 
10 meningitidis: 

60 70 80 90 100 110 

orfl05.pep ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 

I I I I I I I I I I 1 1: 1 I I I I I I M I M I I I M 
orflOSa MPTVRFTESVSKHDLDALFEWAKASYGAES 

10 20 30 

120 130 140 150 160 170 

orf 105 . pep CWKTLYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWH 

I I I I I t I I I HllM!ll:lli lllllli II lllllllll! I I I I I I I: 

orf 105a CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHLARIWK 
40 50 60 70 80 90 

180 190 200 210 220 230 

orf 105 . pep CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 
Mil I It : t I I I f I I I I : I I I I : I I I i It I I I I I I I I I I I! : I t I I I I I I II 1 I I 
orf 105a EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 
100 110 120 130 140 150 

240 250 260 270 280 290 

orf 105. pep SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 

I I I I I I II : I I I I I I I I 1 I : I t : I I I : I t I I I I I I I I I I I I I I t I I I I I I II I I I I II 
orf 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 

160 170 180 190 200 210 

300 310 
orf 105. pep RGVHNEI LYVFDAVLP 

II I I I I I II I I i I I I t 

orf 105a RGVHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMH DAQLVTLDAF 

220 230 240 250 260 270 
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35 



40 The complete length ORF 1 05a nucleotide sequence <SEQ ED 41 3> is: 
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50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGCCGACCG 
CCTATTCGAG 
CGCTGTATCT 
GAGCGCGTCA 
CATTTTCCTG 
ACCTCGCCCG 
GAGTGTTTCG 
ACGCGCCGCT 
ACGGTTTGGT 
AGTCCGCACA 
CGGTGTTTCC 
GCGAAGAAGC 
TCGCAGCTGC 
CCTGTATGTA 
AGGATGGCGA 
GCTGCCATGT 
GGACGCGTTT 
AGTGGCTGGA 



TCCGTTTTAC 
TGGGCAAAGG 
GAACGGTCTG 
AAAAAGACTG 
AATGCGGACG 
AATATGGAAA 
ACCTGACCGA 
TTCCGTCCGT 
CGAATCGGAC 
AAGCAGTCGA 
AGCGGTGAAT 
CGGTTTGGAT 
ACAGCCTGCG 
TTCGATGCCG 
AGTGGCGGGT 
TGTCGGGAAA 
TGCCGTTACG 
CGGCATACGT 



CGAATCCGTC 
CAAGTTACGG 
CCTTTGGGCA 
GGAGGCAGGC 
GCTGGCCAGA 
GAAGCGGGAC 
CGGCGGCAGC 
TCGGACTGCT 
GGCCGATGGC 
TCCCGACAAA 
TGCCGTCTGA 
AAAACGCTGC 
CCCCGTCAGC 
TCCTGCCCGA 
TTTGAGAAAA 
CATGATGCAC 
GTCTGATTGA 
TTATAG 



AGCAAACACG 
TGCGGAAAGT 
ATCTGTCGCC 
TGCTCGGAGT 
TATGGGCAGA 
TGCTTCACGG 
AATCCCTTGT 
CAGCCGCGCC 
ATTTCTGGAT 
CTCGACAATA 
AACCGTGTGT 
TTCCGCTCAT 
CGGGGTGTGC 
AACCTTCCTG 
TGGACATCGG 
GACGCGCAAC 
TGCCGCCCAT 



ACCTTGATGC 
TGCTGGAAAA 
GGAATGGGCG 
CTTCAGACGG 
CGCTTGCAGC 
CTGGCGCGAC 
TCGCGCTCGA 
GTCCATCTCA 
AGGCAGGCGC 
CTGCCGCCGG 
CGCGAAAGCA 
CCGCCCGGTA 
ACAATGAAAT 
CCTGAAAATC 
CGGTCTGTTG 
TGGTTACGCT 
CCGCTGTCCG 



60 



This encodes a protein having amino acid sequence <SEQ ID 414>: 

1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 
51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 
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101 ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 
151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 qSpvs rgvhneilyv fdavlpetfl penqdgevag fekmdiggll 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

5 ORF105a and ORF105-1 show 93.8% identity in 291 aa overlap: 
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35 



orflOSa.pep 
orfl05-l 

orf 105a. pep 
orfl05-l 

orflOSa.pep 
orfl05-l 

orflOSa.pep 
orfl05-l 

orf 105a. pep 
orfl05-l 



10 20 30 40 50 60 

MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPIX3NLSPEW 

in 20 30 40 50 60 



10 
70 



40 
100 



70 80 90 100 110 

CSESSDGIFLNADGWPDMGRRLQHU^IWKEAGLUiGWRDECFDLTDGGSNPLFALERAA 

Mt llllll III ii inn MINI i: mi ,,,:M,m!,,: "JU "ii 

CSESSDGIFLRADGWPD^GRWHIJUjGWHCAGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

frpfgllsravhlnglvesdgrwhfwigrrsphkavdpdkldntaaggvssgelpsetvc 

, 1 1 , i , , , i , , i n 1 1 : || |M | | HI I I I III I I I I 1:1 I III III II I 1:1 Is I I I • ' I 
raPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 

130 140 150 160 170 180 

190 200 210 220 230 240 

RESSEEAGLDKTLLPLIRPVSQLHSI^RPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

Y7,, 1 1 m ii ii mm inn ii i nmiimiiiiiiiiMiiiniMMii 

^SS^GLDKTLLPLIRPVSQUiSLRSVSR(^HNEILYVFOAVLPETFLPENQDGEVAG 
190 200 210 220 230 240 

250 260 270 280 290 

FEKMDIGGLLAAMLSGKMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 

in, linn imimnnmiimmmmmmiimi 

FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
250 260 270 280 290 
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120 



Homology with a predicted O RF from Gonorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORF105.ng) from N. 
gonorrhoeae: 



40 



45 



50 
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orf 105. pep 
orflOSng 
orfl05.pep 
orflOSng 
orf 105. pep 
orflOSng 
orfl05.pep 
orflOSng 
orflOS.pep 
orfl05ng 
orflOS.pep 
orflOSng 



MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRI FLPAAI SER 

mm in i ii it ii in =11111111 n i 1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 

MVARRAHNPKWGSNPAPATKYQTPRFNAEGVLF FLFPAASV FCRI FL PAAI S ER 

OTAVCLRLQIQAWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 

7. | m , mi , n in inn mem ii milium mi mmimm 

QMVCIJU*QIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 

LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 
mi | I | , | in ,: , I: | III I || III : Ml 111 HI I IN m HI I I : HI 
LYLNRLPLGNLS PEWAERIKKDWEAGCSE SSNG I FLNADGWPDMGGRLQHLARTWNKAGL 

LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 

i 1 1 M i M m i m 1 1 1 1 1 ii m 1 1 urn immiMuiiiimmmi 

^G^ECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHJUNGLVESNGRWHFWIGRRSPHK 

AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPL1RPVSQLHSLRSVSRGVH 

IIIUIIII : M I I I I 1 1 I I I I I IN Ml IN I IN I : M I I M I : II II I J I I I M II 
AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 

NEILYVFDAVLP 

NEILWFDAV^ETFLPENQDGEVAGFEKMDIGGLLDAMLSK^ 



60 



120 



115 
180 
175 
240 
235 
300 
295 
312 
355 



A complete length ORF105ng nucleotide sequence <SEQ ID 415> was predicted to encode a 



protein having amino acid sequence <SEQ ED 41 6>: 
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1 MVARRAHNPK WGSNPAPAT KYQTPRFNAE G VLFFLFPAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLIDAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence <SEQ ID 417>: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

401 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 

601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

751 GATGCCATGT TGTCGAAAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 



1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L« 

ORG105ng-l and ORF1 05-1 show 93.5% identity in 291 aa overlap: 



10 20 30 40 50 60 

orf 105- 1 . pep MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
I 1 I I 1 1 I I I I I I I II M M I I I I I I I I I I II I I 1 I II I II I I I I I II : I I : II I I I I I 
orflOSng-1 MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 105-1 . pep C S E S S DG I FLN ADGW PDMGGRLQH LALGW H CAG LL DGWRN EC FD LT DGGGN P L FT LE RAA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I: Mil 1 I I I I I I I N I I I I I I I I I I I I I I 
orfi05ng-l CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 105-1 .pep FRPFGLLSRAVHLNGLTESDGRWHFWIGRRS PHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
I I I I I I I I 1 1 I I I 1 I I : I I : I I I I I I I I I I I I I I II I I : I M I I : I M I I I I I I I I I II 
orfl05ng-l FRPFGLLSRAVHLNGLVESNGRWHFWIGRRS PKKAVDPGKLDNIAGGGVSGGEMPSEAVC 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 105-1. pep RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I I I I I I II II I I I : I f I I I I I : II I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orfl05ng-l RESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 220 230 240 



250 260 270 280 290 

orf 105-1. pep FEKMDIGGLL DAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDG I RLX 
I I I I I I ! I I I I I I I I I IMIIIIIMIII I I I I I I I I I I I I I I I I M II I 
orfl05ng-l FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 

250 260 270 280 29C 



PCT/IB98/0166S 

WO 99/24578 

-258- 

Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P4188B ITNR3 SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KIHASE) 
^,siirnfi928lDirl IS52350 thiamin pyrophosphokmase (EC 2.7.6.2) - £l ?"°" y 

, aSESSS 523 S^SSSM 

pyrophosphokinase (Schizosaccharomyces pombe] Length - 569 

ffi^ sri.??s%,rK5^'- c«%i. G a PS .3/ 192 (») - 

10 Query 268 NKAGLLHGWRNEC FDLT DGGGN PLFTLERAAFRPFGLLSRAVHLNGLVESNGBW J 441 

m r+ upup + + P+ +ER F FG LS VH + + "* 

Sbjct: 96 NTFGIADQWRNELYTVYGKSKKPVLAVERGGFMLFGFLSTGVHCTMYIPATKEHPLRIKV 155 

Query 442 GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSliR 621 
ic nRt;p K p ldn GG++ G+ + +E SEEA LD + LI P ♦ 

Sbjct: 156 PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 214 

Ouerv 622 PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 798 
uuery. o« + + +p OGEVAGF + + +L 4 K+ + LV 

20 Sbjct: 215 MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 274 

Query: 799 LDAFYRYGLIDAAHP 843 

LD R+G+I HP 
Sbjct: 275 LDFLIRHGI ITPQHP 289 

25 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 49 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
30 419>: 

1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

SI CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

•!<; 201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

35 251 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

401 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 

40 451 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC .CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

AC 51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

HJ 10 i FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following results: 
Wnmnlngv with a predicted OR P from N./neninritidis (strain A) 
50 ORF107 shows 97.8% identity over a 1 86aa overlap with an ORF (ORF107a) from strain A of N. 

meningitidis: 
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MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASlSALLIILFLircNYT^ 
ii i i ii i i U I M 1 1 I I I I 1 n h M I 1 1 M I I I I H I III 1 1 1 1 | 1 1 I 1 1 1 1 » ^ 1 1 ' 1 
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20 



30 



40 



120 



70 80 90 100 110 

TVEGQILPASGVIRVYAPDTXTITAKEVEDGXKVKAGDKL^ 

m. Til III III Mil MM till II ill I II II Mil II I II III U I MM Ml 
TVEGQILPASGVIRW 



110 
170 



180 



130 140 150 160 

EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHI SQQI 
Mill i | M M II I II I M I II M M M M II M II II I I I M I I M I ! M II M M M I 

13 0 140 150 160 170 1BU 
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kyrflsxqx 

i^r^sandavpkqemmnvkaelleqkakldayrreevgllqeirtqnltlxslpqaax 

190 200 210 220 230 



The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 



25 



30 



35 



1 


atgaatagac 


51 


CCAAACCAGC 


101 


CCCTATGGAC 


151 


TTGATATTTG 


201 


ACCTGCATCG 


251 


CNGCGAAATT 


301 


TTTGCGCTTT 


351 


GTTGAAAACG 


401 


GTCGTCTGAA 


451 


GTCGAACGTT 


501 


TCAGAAAAGG 


551 


TCCTATCCGC 


601 


GCAGAGCTTT 


651 


AGTCGGGCTG 


701 


TCCCCCAAGC 



CCAAGCAACC 
CTGACGGGTA 
GACATTTGCA 
GTAACTATAC 
GGCGTAATCA 
CNTGGAAGAT 
CGACCTCACG 
GAGGCAGTTT 
GCTGATACAC 
TGGAAAACCA 
CGCATTAGAC 
CAATGATGCA 
TAGAGCAGAA 
CTTCAGGAAA 
GGCATGA 



NTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGAGAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 
GTGCCAAAAC 
AGCCAAACTT 
TCCGCACGCA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGGATACG 
TTAAGGCTGG 
GGAGATAGCG 
GTTGGCAGAA 
CGCGCAGCCT 
ATTTCGCAAC 
AATGTTGCAG 
AAGAAATGAT 
GATGCCTACC 
GAATCTGACA 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
GGGACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 
GAATGTCAAG 
GCCGAGAAGA 
TTGGNNAGCC 



40 



45 



This encodes a protein having amino acid sequence <SEQ ID 422>: 

1 MNRPKQPFFR pevavarqts ltgkviltrp lsfslwttfa sisalliilf 

51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

ioi fMItsrfga gdsvqqqlkt eavlkktlae QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 

Homology with a predict^ ORF from N. gonorrhoeae 

ORF107 shows 95.7% identity over a 1 88aa overlap with a predicted ORF (ORF107.ng) from N. 
gonorrhoeae: 



50 



55 



60 



orf 107. pep 
orfl07ng 
orf 107. pep 
crfl07ng 
orf 107. pep 
orf 107ng 
orf 107 .pep 
orf 107ng 



MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
III | I Ml I 1 1 I I I U I I I M I I I I II I M I I M I M M I M I II I 1 1 M 1 1 M I Ml M 
WRPKQPF^ 

TVEGOILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 

1 = 111 I M I M I II I M I I I IMUIMII I M M II I It M M II M I M I M I I M 
TT^GQILPASGVIRVyAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 

EAVIJCKTI^OEI^R^^LIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEIMLQ 
I I I III I M 1 1 1 M M I I II MMIIMIIMMIMIMIMIIMMIIMIIMI: 
^VUCKTIJ^QEI^RLKLIHENETRSLKAT^^ 



60 



60 



120 



120 
180 



180 



KYRFLSXQ 
I II I II I 
KYRFLSAQ 
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188 
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The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

5 101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

10 Example 50 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
425>: 

1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT.TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

15 101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

20 351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 

25 1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 MET DDK DS PA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 

30 1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

35 251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

40 501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 

1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

45 151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Computer analysis of this amino acid sequence gave the following results: 
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Homoloffy with a predicted 0RF *™™ M onnorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from N. 
gonorrhoeae: 



orf 108 . pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAIAGLDLGQSSE 60 
5 II: I I I I I f I I I Mil Ml I M I I : I I M 1 1 I r I I MMIIMII IIIMI 

orfl08ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orf 108 . pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
I I 1 M II II I I M I I I I M I I : : II I I I I M I t I I I I I || I I M : I : I I 1 1 1 I M M 
1 0 orf 108ng GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orf 108 .peD LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

I i 1 I I I t I I ( I I I I I I I I I I : I I = I I I I 1 I I I I I I I I I | | | | I I 1 1 I I I 1 f I I 1 1 I I I I I I 
orf!08ng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 



15 



ORF108-1 shows 92.3% identity with ORF108ng over the same 181 aa overlap: 



20 



orfl08-l 


.pep 


orf lO&ng* 


-1 


orfl06-l 


.pep 


orfl08ng« 


-1 


orfl08-l 


-pep 


orfl08ng 


-1 



MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAIAGLDLGQSSE 60 
Ml I M II II I M M I I I M M M M M I :\ M I II I II I M II II M II I M II II 
MLKI PFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYI DNTAIAGLALGQSSE 60 

GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
I I II II M M M I M M I I M : M 1 II II M I M I I I M I I I I I M M I M 11 II M 
GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 



25 orf 108-1 . pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

If II I I II M M I M I M M : M : \ I II I I I I M I M I M M I I II I I ! I II M I M II I I 
LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

30 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

201 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA Aacgccgtcc 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

35 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

40 This encodes a protein having amino acid sequence <SEQ ID 430>: 

1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLA LGOSSE GKT NDGKKQI SYPIKGLPEQ KAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

50 Example 51 

The following DNA sequence was identified in N. meningitidis <SEQ ID 43 1>: 
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l 

51 
101 
151 

5 201 
251 
301 
351 
401 

10 451 

501 
551 
601 
6S1 



ATGGAAGATT TATATATAAT ACTCGCTTTG ^TWGGTTG JGATGATTGC 
CGgATTTATC GATgegatTg cGggCGGGGG TGGTTTGATT ^CTGCCCG 
CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CAC^CAAG 
CTGCAAaCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 
aSttt*1St gattggaaga AAGGTCTCCC GATTGCCGCA GCATCGTTTG 
taIgcggcct SSggtgca ttatcggtca gcttggtttc CAAAGATATT 

CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

StScgccc aagctcgacg gcagtaagga aggcaaagcc asaatgtctt 

TTTTTCTGTT cGGGCTGACG GTCGC ACCG CTTTTG<3GTT !IJACGACGG 
TTTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTT1 
IfSs CAAqCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

ScctgS atotggttc gctatcggta TTCCTGCTGC ACGGTTCGAT 

3£mS AtSgSaA CGaTGGCGGT CGGTGCGTTT £CGGtGCGA 
OJi ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

15 This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

' si XIatfS TVSFARKGLI DWKKGLPIAA ASFVGGVAGA lsvslvskdi 

!01 LUHWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CvSrIvF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

20 201 YFPDCGNDCG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 433>: 

1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CActStCTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

7< ill CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

25 201 AgStTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

III tSgGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

\l\ GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

,A III TtScTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 

30 let GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

ill Sggctgc aagctgttga acgcgatgtc TTACACCAAA ttggcgaacg 

III TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

III StttcccX TTGCGGCAAC GATGGCGGTC ggtgcgtttg tcggtgcgaa 
,c «m twmgtgS agatttgccg tccgcttcgg ttcgaagctg attaagccgc 

35 70 TGCTGATTGT cKcATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 

1 MED LYIILAL Gt.VftMIAGFI DAIAG GGGLI TLPALLLAGI EggM^ 

A n si Lc^aaatTsa tvsfarkgi t dwSg lpi aa astvggvaga lsvslv skdi 

40 101 L^AWPVLLI FVALYFVFSP KLDGSKEGKA BMSFFLFGLT VAPLLGFYDG 

\l\ VFGPGVGSFF LIAFIV LLGCKLLMAMSYTK LANVACNLGS LSVFLLHGSI 

\l\ " GAFVGAH LGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

45 Computer analysis of this amino acid sequence gave the following results: 
HnmAlnp y with a credi t SEE from N meningitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

' ' ' ' ' ' " ALGLV^IMFIDAIAGCGGLIT — -to^tatHKI^AAAATFSA 
10 20 30 



ocf 10Sa MEDLYIILALGLV^IMFIDAIAGCGGLITLPALLL^ 
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orf!09 pep kldgskeg^fflfg^vxtrfgf^c^tg^vfsdclycf^lqaverdvw 

OC£109S kLekSSKEGKARMSFFL^ 

The complete length ORF109a nucleotide sequence <SEQ ID 435> is: 

£ S5KK 35555 S5SSS SS5S S 

i Ei Si i sssss ssss ssss 

J$ arGTTTGATT GMtSSgA AAGGTCTCCC GATTGCGGCA GCATCGTTTG 
III r^rrrJcGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

111 ^gctgIcgg 5SS£ tttgttgata tttgtcgcgc tgtattttgt 

Ml cttt?cgccc aagctcgacg gcagtaagga aggcaaagcc agaatgtctt 

ll\ cggStgacg gttgcaccac ttttgggttt ttacgacggt 

45! gtgttcggac cgggtgtcgg ctcgtttttt ctgattgcct ttattgtttt 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 
11] TT^rCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 
I^TTCCCct TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

III Sggtgcg agatttgccg tccgcttcgg ttcgaagctg ATTAAGCCGC 

701 TGCTGATTGT aScAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 
751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This encodes a protein having amino acid sequence <SEQ ID 436>: 

<,c i MEDLYIILAL fiT-WM TaBFT DAIAGGGGLI TL PALLLAGI PPVSAIATNK 

25 5 } rnSuTO T"""~" n W KKf;i.P T AA ASFAGGWGA LSVSLV SKDI 

ill l2a^vLlI FVALYVVFSP vtn^QKBBKr-Sa-SFFLFSLT VAPLLGFYDG 

\H vSgSFF LIAF IVLLGCKLLNAMSYTK LANVACNLGS LSVFLLHGSI 

2ol m Sjlm GAFVGA NLGA RFAVRFGSKL IKPLLIVISI SMAVKLLIDE 

30 251 RNPLYQMIVS MF* 

ORF109a and ORF109-1 show 99.2% identity in 262 aa overlap: 



ocf 109a. pep 



10 20 30 40 50 60 

MEDLYIII^GLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLOAAAATFS^ 

miii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 n 1 1 1 1 ij^mjm 



40 



erf 109a. pep 



35 orfl09-l MEDLYIIl^LVAMIAGFIDAIAGGGGU^ 

70 80 90 100 110 120 

orf!09-l {vSFARKgIiDWOTGLPIAA^FVG 

,30 140 ISO 160 170 180 

orfl09a. P ep 

or fl 09-l ii^^SFF^ 
, A 190 200 210 220 230 240 

50 t^acnlgslsvfllhgsiifpiaatmavgafvg™^ 



orf 109a. pep 



mm 1 1 1 m 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 » 1 1 1 1 1 i 1 1 1 1 1 1 1 

orf!09-l Ji^cm*^ 
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60 



250 260 
SMAVKLLI DERM PLYQMIVSMFX 
| 1 1 1 1 I 1 I I I I I I t 1 t I 1 I 1 I I I 
orf 109-1 SMAVKLLIDERNPLYOMIVSMFX 
250 260 
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Homolnpny with a predicted ORF frnm M onnorrhoeae 

ORF109 shows 98.3% identity over a 231aa overlap with a predicted ORF (ORF109.ng) from N. 
gonorrhoeae: 

orf 109 .pep M£DLYIIIJU,GLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 r 1 1 1 | | | | | | t I t I I I I f I I I I I I I i -I 

orfl09ng MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109 .pep TVSFARKGLI DWKKGLPIAAAS FVGGVAGALSVSLVSKDILLAWPVLLI FVALYFVFSP 120 
I M I I I II I I I I I I I I I I I I I M : I I I : I I I I I II II I I II II I I I I I I I I I I II I I I I I 
10 orf!09ng TVS FARKGLI DWKKGLPIAAAS FAGGWGALSVSLVSKDILLAWPVLLI FVALYFVFS P 120 
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orf 109 . pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

1 1 1 u i m 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r i r i f 

orfl09ng KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109 .pep IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

I M II I II I I I I I I I M I I I II I I I I I II I I t I I I I I I I I I I I I I I I I I I 
orfl09ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 



An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
20 acid sequence <SEQ ID 438>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVS FARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

25 201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 439>: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATCGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

30 151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 TTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

35 4 01 TTTTTCTATT CGGGCTGACG GTTGCACCGC TTTTGGGTTT TTACGACGGT 

4 51 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCTTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGTGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

40 651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

7 51 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 440; ORF1 09ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

45 51 LQAAAATFSA TVS FARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYFV FSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIVATMAV GAFVGAN LGA RFAVRFGSKL I KPLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

50 ORF109ng-l and ORF109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 109ng-l .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
MIMIMMMIMIMMI I! I I IMMIIMMII I M IIIIMII IIIMIIIIM 
orf 109-1 MEDLYI I LALGLVAMI AGFI DAI AGGGGLITLPALLLAG I PPVSAIATNKLQAAAATFSA 

55 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 109ng-l .pep TVS FARKGLI DWKKGLPIAAAS FAGGWGALSVSLVSKDILLAWPVLLI FVALYFVFSP 
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orfl09-l UsfA^g!^g[pIA^ 

70 80 



. 130 140 150 160 l /u AOU 

orfl09na-l pep KLDGSKEGKARMSFFLFGLTVAP^ 

or£109ng l.pep atBliaillllliB1BBail|1B|ll i||tlli||||i|i|||llllIiiillllliill 

or£io9-i niilii^ 

130 140 ISO 160 1 ■* ^ 

10 190 200 210 220 230 240 

orfl09nq-l pep LANVACNUSSLSVFLLHGSIIFPIVATMA^^ 

orflOSng l.pep ( ( ( , , , , , , , , , , , , , II I II II 

orflOS-1 tWi«LGSLTO^ 

orf109 1 190 200 210 220 230 240 



15 

250 260 
orf 109nq-l .pep SMAVKLLI DERNPLYQMIVSMFX 
lIHIllMiilltlllllllll 
20 orfl09-l SMAVKLLI DERNPLYQMIVSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

SPIP29942IYCB9 PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3 • REGION (0RF9) 
>gi| 94984 |pirl7l38164 hypothetical protein 9 - Pseudomonas sp >gi 1 551929 
25 (M62866) ORF9 [Pseudomonas denitrif icans] Length - Zbi 

£SL^ n - S55iJ 4 ?;U,! ,, KitIi'i 3 i3i/2i4 <«o%,. G a PS = 1 / 2 » ™ 

Ouerv 41 PPVSAIATNKLOXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 
Sbjct: 43 pkQTWTNKLQGLFGSGSATLSYARRGHVNLKEOLPMALMSAAGAVLGALIATIVPGDV 102 

Query: 101 -W-£^^ 160 
35 Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query 161 LIAFIVLLGCKLLNAMSYTKI^VACNLGSLSVFLLHGSIIFPIVAT^VGAFVG^LGA 220 
yuery. J +L ft ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 

Sbjct: 162 MWFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 221 

40 Query: 221 RFAVRFGSKLIKPLLIVISISMAVKLLIDERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKI IKPLLVIVSIALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
45 and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 52 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 441 >: 

«0 1 . CTGCTAGGGT ATTGCATCGG TTATCGGTAC G§CTGTTGCA GCAAAACCAG 

50 5 T cCQCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG.ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

„ III atcTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

55 jo? CCGAGCTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

6 q jo! GGTCGGATTG TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC.AAGC 
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551 CCGAAAGTAT .TTTGGGTGC gTCCAATCTC TCATTTAGGG GCAACGTCAA 
601 TATTTCCG.A GGGGCAGAgT GCGGATGTGG TTTTCCTGA 

This corresponds to the amino acid sequence <SEQ ID 442; ORF1 1 0>: 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

5 51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with ORF88a from N. meningitidis (strain A) 

ORF1 10 shows 91.5% identity over a 188aa overlap with ORF88a from strain A of N. meningitidis: 



15 



20 



25 



30 



35 



40 



10 20 30 40 50 60 

or f 88a . pep MSKSRRSPPLLSRPWFAFFSSMRF AVALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

I I I I M I I I I : I I II I I I I I I t I I I I I I I I 
orfllO LLG I AS VI GTLL QQNQPQTD YLVKFGSFWA 

10 20 30 

70 80 90 100 110 120 

orf B8a . pep QIFGFLGLYDVYASAW FWIMMFLVVSTSLCLI RNVPPFWREMKSFREKV KEKSLAAMRH 
1 I I || II I I II I I I I I II I I I 1 I I I I II II I I I I I I I I II I I I I I I I I I I I I I M I I I I 
orf 110 XIFGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 
40 50 60 70 80 90 

130 140 150 160 170 180 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 
I I I I I I II I I I I I I I I II I I I I I I I I I II I I I II I I I I I I II I I I I I I M I M I M I I I I 
orf 110 SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf 88a . pep GGLI DSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 

I I I I I I I I II I I I I I I 1 1 I : : : MM : I 
or f 1 1 0 GGLI DSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 
160 170 180 190 200 210 

250 260 270 280 290 300 

orf 88a . pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

orfllO SX 

However, ORF88 and ORF1 10 do not align, because they represent two different fragments of the 
same protein. 

Homology with a predicted ORF from N. gonorrhoeae 

ORF1 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORF1 lO.ng) from N. 



gonorrhoeae: 



45 



50 



55 



orf 110. pep 
orfllOng 
orfllO. pep 
orfllOng 
orfllO. pep 
orfllOng 



LLG I AS VI GTLLQQNQPQTD YLVKFGSFWA 
I I I I I I I I I I : I I I I I I I I II II II I II: 
MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 



30 



60 



90 



XIFGFI^LYDVYASAWF^IMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 
II I I I III M I I I I I III I I II I I I 1 I I I I I I I I I I I I I I I I I I I I II I I Ml I I I I I 

RIFDFI^LYDVYASAWFWIt^FLWSTSLCLIRNVPPFWREMKSFREKVKEKSIAAMRH 120 

SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 1 50 
I I I I I II M I It I I iil I I: I I II I I :: I I I I I I I I I I I I II 1 III I I I llllllllll 

SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 
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GGLIDSNLLLKLGMLTGRIFR^IRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

I ||: II I I 1 1 I I I : I III: { | I I M MM : I Mill I M I M I I : M I 1 1 
GRLINXNLLLKU3MlAGSlFRNNRR\^PRl SKPESIWG GVQSLIKGQRQYrQRGKVRMWF 240 

S 211 
I 

S 241 

The complete length ORFHOng nucleotide sequence <SEQ ID 443> is predicted to-encode a 
protein having amino acid sequence <SEQ ID 444>: 

10 1 MSKSRISPTL LSRPW FAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGSIF 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

1 5 Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



orfllO.pep 
orfllOng 
5 orfllO.pep 
orfllOng 



Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ID 445>: 

20 1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

25 251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA G AC AAAAT CA TTTTGAAACA 

30 501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

701 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

35 751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

40 1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This corresponds to the amino acid sequence <SEQ ID 446; ORF1 1 1>: 

1 MPSETRLPNF IRVLIFALGF I FLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AEIQKRI DDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

45 101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEO 

151 IKOAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 

301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

50 351 R* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology v ^tK * predicted ORF frrmi *r *»™infritidis (strain 

ORF1 1 1 shows 96.9% identity over a 351aa overlap with an ORF (ORF1 1 la) from strain A of N. 



meningitidis: 



10 



15 



20 



25 



30 



35 



orf Ilia. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 



10 20 30 40 50 60 

MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLESP 
1 1 I I I f I M | | | r I I I I 1 : 1 I I M M I I I I I I I I M I 1 1 I K I I I I 1 I 1 I M I 1 I 1 Mil 
MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

10 20 30 40 50 60 

70 80 90 100 HO 120 

AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
MM m I I I I I I I I I 1 1 I I I I I I I I I I 1 1 I I I 1 1 1 1 1 1 1 1 1 I I I I M II 1 I I : I I 1 1 I I 
AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
70 80 90 100 HO 120 

130 140 150 160 1*70 180 

GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
M M I M I I I I M I I III I I I I I I I I H I I I M II I I II I I M II I I I I I I I I I t I I I I I 
GALOVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 

190 200 210 220 230 240 

AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 

MIMMIIMIMI 1 1 t I I M I I I I I I I I M I I I I 1 1 I I II I M I II I I I I I M I I I 
AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

GGNTQI I VPLNNRSXATSGDYRI FHVDKSGKRLSHI INPNNKRPI SHNLAS I S VXADSAM 
M I I I I I I I I I I I I I 1 1 I I M M I I I t : I I M 1 1 1 1 I M I M M It I II M M 1 1 1 1 1 
GGNTQI IVPLNNRSLATSGDYRI FHVDKNGKRLSH I INPNNKRPI S HNLAS IS WADSAM 

250 260 270 280 290 300 

310 320 330 340 350 

TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

mii MiiiiiiiiiimiiMmiiiimMimiiMMMMM 

TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
310 320 330 340 350 



The complete length ORF1 11a nucleotide sequence <SEQ ED 447> is: 



40 



45 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGCCGTCTG 

CCTGAGTTTT 

TTACCCTGCA 

TCAAATAATC 

CGATGACGCG 

ACTCCGAAAT 

ATTTCAAGCG 

CCTGACACAC 

GGGGATTCGG 

ATCAAACAAG 

AGGCAAAGAT 

ATTTATCTTC 

CTGGAAAAAT 

GCACGGCAAA 

AACAGCCCAA 

AACAACCGTT 

TAAAAGCGGC 

CCATCAGCCA 

ACGGCGGACG 

CTTAAAGCTG 

ATAAAGGCGG 

CGCTAA 



AAACACGCCT 

ATCTTCCTGA 

AGGTGAAACG 

GGGACNAACT 

CTTAAAGAAG 

CAGCCGGTTC 

ACTTCGCACA 

GGCGCGCTGG 

CCCCGACAAA 

CAGCATCTTA 

TACGCTTCCT 

GATTGCCAAA 

ACGGCATTCA 

GNCAAAAACG 

CATCGTCCAA 

CGNTTGCCAC 

AAACGCCTCT 

CAACCTCGCC 

GCTTNTCCAC 

GCAGAGCGCG 

CTACCGCACC 



GCCGAACTTT 

ACGCCTGTTC 

ATGGGCACGA 

CCCNTCACCT 

TCAACCGGCA 

AACCAACACA 

CGTTACTGCC 

ACGTAACCGT 

TCCGTTACCC 

TACGGGCATA 

TGAGCAAAAC 

GGCTTCGGCG 

AAATTATCTG 

CGCGCGGCGA 

GGCGGCAATA 

TTCCGGCGAT 

CCCATATCAT 

TCCATCAGCG 

AGGATTATTC 

AAAAACTCGC 

GCCATGTCTT 



ATCCGCACCT 

GGAACAAACC 

CCTATACCGT 

GCCGAAATAC 

GATGTCCACC 

CAGCCGGCAA 

GAAGCCGTCC 

CGGCCCCTTG 

GTGAACCGTC 

GACAAAATCA 

CCACCCCAAG 

TTGATNANGT 

GTCGAAATCG 

ACCTTGGCGC 

CGCAGATTAT 

TACCGTATTT 

TAATCCGAAC 

TGNTCGCAGA 

GTATTGGGCG 

TGTTTTCCTG 

CCGAATTTGA 



TGATATTTGC 

GCGCAAACCG 

CAAATACCTT 

AAAANCGCAT 

TATCAGCCCG 

GCCCCTCCGC 

ACCTGAACCG 

GTCAACCTTT 

GCCGGAACAA 

TTTTGAAACA 

GCCTATTTGG 

TGCGGGCGAA 

GCGGNGAGTT 

ATCGGCATCG 

CGTCCCGCTG 

TCCACGTCGA 

AACAAACGAC 

CAGTGCGATG 

AAACCGAAGC 

ATTGTCAGGG 

AAAACTGCTC 



This encodes a protein having amino acid sequence <SEQ ID 448>: 

1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
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101 ISSDFAHVTA EAVHLNRLTH GALDVTVGPL VNLWGFGPDK 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK 

201 LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ 

251 NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA 

301 TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT 

351 R* 



SVTREPSPEQ 
GFGVDXVAGE 
GGKTQIIVPL 
SISVXADSAM 
AMSSEFEKLL 



Homolo gy with a predicted N. gonorrhoeae 

ORF1 1 1 shows 96.6% identity over a 351 aa overlap with a predicted ORF (ORF111 .ng) from N. 



10 gonorrhoeae: 



40 



50 



60 



15 



20 



25 



30 



35 



40 



45 



orf lllng 
orflll 

orflll 
orflll 

orflllng 
orflll 

orflllng 
orflll 

orflllng 
orflll 

orflllng 
orflll 



in 20 30 

MPSETIU,PHLIRALIFAI/3FIFLNACSEQTAOTVTL^ 

H^SE^^PNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
20 30 40 50 



10 



120 



70 80 90 100 110 

AK1QKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLKRLTH 

7V,Yi i 1 1 1 1 1 1 1 1 1 1 1 IIMllllllliilllMilllllMtltiMlllli 

miiqi^iddLke^ 



70 



80 



90 



100 



110 
170 



120 
180 



13O 140 150 160 

galdvtvgplvnlwgfgpdksvtrepspeqikqmsytgidkiilqqgkdyasls^hpk 
. 1 1 1 I I I I 1 I ] I I 1 I I I I I I i I I I I I I I I I 1 1 I ) I I I I I I I 1 1 M : I M I t I i I I M I M 
GALDVTVGPLVNLWGFGPDKSVTREPS PEQIKQAAS YTGI DKI I LKQGKDY AS LSKTHPK 

150 160 170 180 



130 



140 



230 



240 



190 200 210 220 

AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGEI^GKGKNAHGEPWRIGIEQPNIIQ 

1 1 1 1 11 ; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 i 1 1 1 1 1 1 « « 1 1 1 1 1 1 1 1 1 * 1 • 1 1 • * 1 1 1 » • » * 1 ' • * 
^[dIssIakg^ 



190 



200 



230 
290 



300 



250 260 270 280 

GGNTQIIVPI^NRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNIASISWSDSAM 

GGNTQIIVPLNNR^ 

250 260 270 280 290 300 



350 



310 320 330 340 

TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1 iiniMiu inn 

TADGLSTGLFVI^ 

310 320 330 340 350 



The complete length ORF1 1 lng nucleotide sequence <SEQ ID 449> is: 
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55 



60 



65 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 



ATGCCGTCTG 
CCTGGGTTTC 
TTACCCTGCA 
TCAAATAATC 
TGATGATGCG 
ATTCCGAAAT 
ATTTCAAGCG 
CCTGACTCAC 
GGGGGTTCGG 
ATCAAACAGG 
. AGGCAAAGAT 
ATTTATCTTC 
CTGGAAAAAT 
GCACGGCAAA 
AGCAACCCAA 
aaCaaccgtt 
TAAAAAcggc 
ccATCAGcca 
ACGGCGGACG 
CTTAAGGCTG 



AAACACGCCT 
ATCTTCCTGA 
AGGCGAAAcg 
GGGACAAACT 
CTTAAAGAAG 
CAGCCGGTTC 
ATTTCGCACA 
GGCGCACTGG 
CCCCGACAAA 
CGGCATCTTA 
TACGCTTCCT 
GATTGCCAAA 
ACGGCATTCA 
GGCAAAAATG 
TATCATCCAA 
cgctTGCCAC 
aaacgccttt 
caacctcgcc 
GTTtatCCAC 
GCAGAACAAG 



GCCGAACCTT 
ACGCCTGTTC 
aTGGGTACGA 
CCCCTCCCCT 
TCAACCGGCA 
AACCAACACA 
CGTTACCGCC 
ACGTAACCGT 
TCCGTTACCC 
TACGGGGATA 
TGAGCAAAAC 
GGCTTCGGCG 
AAATTATCTG 
CGCACGGCGA 
GgcgGCAata 
TTCCGGCGAT 
cccacaTCAT 
tccatcagcg 
AGGATTATTT 
AAAAACTCGC 



ATCCGCGCCT 
GGaacaaacC 
CCTATACCGT 
GCCAAAATAC 
GATGTCCACC 
CAGCCGGCAA 
GAAGCCGTCC 
CGGCCCTTTG 
GTGAACCGTC 
GACAAAATCA 
CCACCCCAAA 
TTGATAAAGT 
GTCGAAAtcg 
ACCGTGGCGC 
CGCAGATTAt 
TAcegtaTTT 
CAATCCCaAC 
tggtctcAGA 
GTTTTAGGCG 
TGTTTTCCTA 



TGATATTTGC 
GCGCAaaccg 
CAAATACCTT 
AAAAGCGCAT 
TACCAGACCG 
GCCCCTCCGC 
GCCTGAACCG 
GTCAACCTTT 
GCCGGAACAA 
TTTTGCAACA 
GCCTATTTGG 
TGCGGGCGAA 
gcggcGAGTT 
ATCGGTATAG 
cgtcccgctg 
tccacgtcgA 
aacAAACgac 
CAGTGCAATG 
AAACCGAAGC 
ATTGTCCGGG 
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1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 
1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSSQT AQTVTLQGET MGTTYTVKYL 

5 51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

10 301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from H.influenzae: 

spIP4 4 550|Y0JL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi 1 1074292 Ipir I 4 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
15 >gi 1 1573128 (032702) hypothetical f Haemophilus influenzae} Length = 346 

Score « 353 bits (896), Expect « 9e-97 

Identities = 181/344 (52%), Positives - 247/344 (71%), Gaps - 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 
20 + LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLI SGI IAVAMALSLAACQKET-KVI SLSGKTMGTTYHVKYLDDGS ITATSE-KTHEE 58 

Query: 67 I DDALKEVNRQMSTYQTDSEI SRFNQHT- AGKPLRIS S DFAHVTAEAVRLNRLTHGALDV 125 
1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 
25 Sbjct: 59 IEAI1J<DV74AKMSTYKKDSELSRFTJQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
S S I AKG FGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I IE+P + 



30 



Sbjct: 179 S S I AKG FG VDQ V AEKLEQLNAQN YMVE I GGE I RAKGKN IEGKPWQIAIE KPTTTGERA VE 238 
35 Query: 24 6 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR I+P PI H+LASI+V++ ++MTADGL 
Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 349 
40 " STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 

Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 54 

45 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 45 1>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCkA yTGGCAATCG GCGTGATGGG 

50 201 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

251 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 

301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

351 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

401 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

55 451 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

60 701 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
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751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA. . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 

, PCRBQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG W^^E 

5 J ' VFVRQNBGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 
S ,01 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 
5 EGIVGKGNNV RFYLQPQAQF ™£™GGFT DSEGTAVGLL . 

201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAwK i mijcajk 
251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 
10 H^nlnov with nut ative secreted VirG-r -"™ V » " f & ^nintntidis (amnion numher A32247 J 
ORF and virg-h protein show 51% aa identity in 261aa overlap: 

0r£35 5 «D^AAHASBOKl»WFIKl>|l«IKGAA-»Cra^ " 
,UT» 396 fflSHroRTIJ8MUfW>VI0GltSN0WVQGKTAPVEGlfRKSVQL6GBVFniQllBSHQtSI <55 

virg-t, 456 GL11GGWEQRSTFHNPDTDHLTTGNVKGFGAGVYATWHQLQDKQTGRYADSMMQYQRFBH 515 

20 o,o5 u. 

25 virg-h 576 SEN)WvliLLGSRQl^RVGVQAKRQFSLYiQlIMEPFWVHALYHNKPEOTQ6DGERRVI 635 

Orf35 242 AGRTALEGRFGIEAGWKGHMS 2 62 
+TA+E + G+ K H++ 
30 virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 

Hnmnln pv with a predicted ORF fr nm N meningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from strain A of AT. 
meningitidis: 



35 



55 



10 20 30 

PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 
orf3o.pep rUIIIII II 1 1 1 1 1 1 I U M II Mill 

orf35a 

310 



: | || I || I M I 1 1 1 1 1 I U I I I I I I I I I 
ORLAI PE AE AV L Y AQQ AY AAN T L FG LRAADRG D DV Y AAD P SRQKLWLR F I GGR S HQN I RG 
;10 320 330 340 350 360 



40 _ 6A fin 70 80 90 

orf35.pep 



orf35a 

45 370 



40 50 60 70 80 90 

GAAADGWRKGVQIGGEVFVRQNEGSX1AIGVWGGRAGQHASVNGKGGAAGSDLYGYGGGV 

GAMDOTRKB Q |||||||| 1111111111111111111111111 » * \ ■ > ■ " 

<Ujj,DGR^<^ 

HO 380 390 400 410 420 



100 110 120 130 140 150 

... nm YAAWHOLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 
orf35.pep ™f™ , , ,,,,,,,, H II I 1 1 I I 1 H III M M I I I I 1 1 1 I I I I M I I !l : I 

50 orf35a ^i^^ 



orf 35. pep 



430 440 450 460 470 

160 170 180 190 200 210 

GKGKSVRKM^ 

, , . ,, , i . I I | | | I i 1 1 | I M | | | | || I I I I I I I I I I I I I 1 I M 1 1 1 1 I I I I II I I II 1 1 
500 510 520 530 540 



orf35a GKGL~ 

4 90 500 510 



220 230 240 250 260 
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orf35a 



LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSARIGYGKRTDGD 
550 560 570 590 590 600 



orf35a 



KEAALSLKWLFX 
610 620 



The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

201 TAATATGCCC GTTGTTAAGA AATATATTAC AGATACTTAC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

401 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

4 51 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

701 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

1201 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

1401 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

1451 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

1801 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 

1851 GCTGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 454>: 



1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT DQNSSEYGYD 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK QLQDLYKTRP 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS VLTPHSNTSQ 

151 TSLNNIFNKK LHVKIENKSH VAGQVLELTK MTLKDSLWEP RRHSDIHMLE 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP ALTFEEKVSG 

251 QSGWLERRP ENLKTLDGRK LIAAEKADSN SFAFKQNYRQ GLYELLLKQC 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG DDVYAADPSR 

351 QKLWLRFIGG RSHQNIRGGA AADGRRKGVQ IGGEVFVRQN EGSRLAIGVM 

401 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT GAYLDGWLQY 

4 51 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK GNNVRFYLQP 

501 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR FALRNGVNLQ 

551 PFAAFNVLHR SKSFGVEMDG EKQTLAGRTA LEGRFGIEAG WKGHMSARIG 

601 YGKRTDGDKE AALSLKWLF* 



Homology with a predicted ORF from N. gonorrhoeae 

ORF35 shows 51.7% identity over a 261 aa overlap with a predicted ORF (ORF35ngh) from N. 
gonorrhoeae: 



orf 35 . pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 34 

:::(:: I : I I I I I I : I : I : : I 

orf35ngh FTKVQERDDIAIYAQQAQAANTLFALRLNDKNSDIFDRTLPRKGLWLRVIDGHSNOWVQG 370 



WO 99/24578 



-273- 



PCI7IB98/01665 



orf 35 . pep GAA- ADGWRKGVQ I GGEVFVRQNEG SXLAI GVMGGRAGQHAS VNGKG - -GAAG SDLYGYG 91 

•I -iMIIIhiiM}: III:: l:ll:|||:| }::: : : : ::: |:| 
orf35ngh KTAPVEGYRKGVQI^EVFTWQNESNQLSIGI^GGQAEQRSTFWJPDTDNLTTGNVKGFG 430 

or f 3 5 . pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 151 

: I I I I : I I I 1 : 1 1 1 I 1 1 I : I : I : i I II I : II I I I : | | : : I I I I I : I : I I I 1 1 : I I 
or f 35ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

orf 35 . pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

: : I t I : : I I I I I 1 1 I : I i I I I I I I : I I I : : I : I I I 1 I I I I : I : : I | : : 1 1 : I 
orf35ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35 . pep GVNLQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 263 

1 1 : : II I : I I : : : : I I 1 1 1 : 1 1 : : : : : : : I : : I : : I : I I : I : : 
orf35ngh G^FQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 

1 . .KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 FRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

4 01 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

4 51 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLCPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 55 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ED 457>: 

1 . . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TATCAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 

1 . . AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 
51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 
101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 

1 ..GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAACTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

201 GGCGGCCATT AAAGGAAATA TCGGCTACAT TGTCCGCTTT TCCGATCACG 

251 GGCACGAAGT CCATTCCCCs TTCGACAACC ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC GAACACCATC CCGCCGACGG CTATGACGGG CCACAGGGCG 

401 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 
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451 
501 
551 



AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

SSacgg cttgccgacc gtttccacaa tgccggtagt atgctgacgc 
Smtagg cgacggattc aaacgcgcca cccgatacag ccccgagctg 

601 SSCGG ^AGCCTTC AACGGCACTG CAGATATCGT 

5 651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 

1 AV CLPMHAHA SXIANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 
•J " G ^KIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP *™HASHSDS 

iol deagspvdgf slyrihwdgy ehhpadgydg pQgggvpapk gardwsydi 

,n HI KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

lU 201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 
H^m»lna y with a pred ated QRF from N gonorrhoeae 

OKF46 shows 98.2% identity over a lllaa overlap with a predicted ORF (ORF46ng) from N. 
15 gonorrhoeae: 

AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNiaTTR 45 

or«6. P ep « ii illinium inn minimi 

orf46ng PKTGVP FDGKG FPN FEKHVKYDTKLDI QELSGGG I PKAKPV FDAKPRWEVDRKLNKLTTR 217 

EQVEroWQETRRRSQSSQFlCAHAQREWENK^ 277 



orf46ng 



or f 4 6. pep RVIQQTSAPDKHGXLSSDSGN 126 

25 I I I I 1 1 1 1 1 1 1 II 1 1 1 M 1 1 

orf46ng RVIQQTSAPDKHGVLSSDSGN 298 

A partial ORF46ng nucleotide sequence <SEQ ID 461> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 

i RRLKHCCHAR LGSAFHRKQD GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 

lft A ••rtohrsrqqy LYGSHPHQRD WSCPGKIQLG RHHGTSCRAV ADXRDRICER 

30 ,m EIRRQRQXCR CRLGKIPSLS IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 

\l\ SmShpkt gvpfdgkgfp nfekhvkydt kldiqelsgg gipkakpvfd 

111 AKPRWEVDRK LHKLTTREQV EKNVQETRRR SQSSQFKAHA QREWENKTGL 

251 SfsSlGGDl NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 

35 Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 

1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGcaAACGAT CCCTTTATCC 

101 GqCagqttcT CGaccGTCAG CATTTCGaac ccgacggGAa ATACCaCCTA 

151 TTcqqCaGCA GGGGGGAGCT TgccnagcGC aacggccATa tcggattggG 

40 201 aaacaTAcaa Agccatcagt tGggccacct gatgattcaa "ggcggccg 

4 « rtaaaqaaaA TAtcgGctac attgtccgct tttccgatca cgggcacaaa 

111 ttccal?cgc ccttcGAcaa ccaTCCCTCA CATTCCGATT CTGACGAAGC 

Hi cggtagtccc g^tgacggat tcagccttta ccgcatccat TGGGACGGAT 

III ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

4S \l\ CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

45 H\ TGCCclS MCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

111 GGCTTGCCGA CCGTTTCCAC AATGCCGGCG CTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

Si SaATGCc gccGAAGCCT TCAACGGCAC TGCAGATATC GTCAAAAACA 

50 ill TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCagGGT 

50 751 ATAAGCGAAG GCTCAAACAT TGCTGTCATG CACGGCTTGG GTCTGCTTTC 

lm CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

111 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

111 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA TGGCAGCCAT 

« III CCCCATCAAA GGGATTGGAG CTGTCCGGGG AAAATACGGC TTGGGCGGCA 

55 1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGCGAT CGCATTGCCG 

loSl AAAG^AAAT ccgccgtcag cgacaatttt gccgatgcgg catacgccaa 

no} CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 
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1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



GTTACGGCAA 
AAAAATGTCA 
TGACGGTAAA 
AGCTCGATAT 
GTGTTTGATG 
GACAACTCGT 
GTCAGAGTAG 
ACAGGGTTAG 
CACAGTAACA 
AACAAACCTC 
ATTAAAAAGC 
AGTGATGACC 
TTAGGGCTGA 
AATAAATGGC 
CGAACCTAAT 



AGAAAACATC 
AACTGGCAGA 
GGGTTTCCGA 
TCAAGAATTA 
CGAAACCGAG 
GAGCAGGTGG 
TCAGTTTAAA 
ATTTTAATCA 
GGAGGGCATA 
GGCACCTGAT 
CTGATGGAAG 
AAGCACACCA 
AGTTACTTCG 
AGGGTACAAG 
AGAACAGCAT 
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ACCTCCTCAA 
CCAACGCCAC 
ATTTTGAGAA 
TCGGGGGGCG 
ATGGGAGGTT 
AGAAAAATGT 
GCCCATGCGC 
TTTTATAGGT 
GTCTAACCCG 
AAACATGGGG 
TTGGGAGGTG 
TGTTCCCAAA 
GCTTGGGAAA 
TAAATCGGGT 
ATCCCATTTA 



CCGTGCCGCC 
CCGAAGACAG 
GCACGTGAAA 
GTATACCTAA 
GATAGGAAGC 
TCAGGAAACG 
AACGAGAATG 
GGTGATATCA 
TGGTGATGTA 
TTTATCAAGC 
AAAACGAAAA 
AGATTGGGAT 
GTAGAATAAT 
ATTAAAATAG 
TGAATAG 



GTCAAACGGC 
GCGTACCGTT 
TATGATACGA 
GGCTAAGCCT 
TTAATAAATT 
AGAAGAAGGA 
GGAAAATAAA 
ATAAGAAAGG 
CGGGTGATAC 
GACAGTGGAA 
AAGGTGGGAA 
GAGGCTAGAA 
GCTTAAGGAT 
AAGGATTTAC 



This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



LGISRKISLI LSILAVCLPM HAHASDLAND 



FGSRGELAXR 
FHSPFDNHAS 
PAPKGARDIY 
GDGFKRATRY 
ISEGSNIAVM 
NAAOGIEAVS 
KGKSAVSDNF 
KNVKLADQRH 
VFDAKPRWEV 
TGLDFNHFIG 
IKKPDGSWEV 
NKWQGTSKSG 



NGHIGLGNIQ 
HSDSDEAGSP 
SYDIKGVAQN 
SPELDRSGNA 
HGLGLLSTEN 
NIFMAAIPIK 
ADAAYAKYPS 
PKTGVPFDGK 
DRKLNKLTTR 
GDINKKGTVT 
KTKKGGKVMT 
IKIEGFTEPN 



SHQLGHLMIQ 
VDGFSLYRIH 
IRLNLTDNRS 
AEAFNGTADI 
KMARINDLAD 
GIGAVRGKYG 
PYHSRNIRSN 
GFPNFEKHVK 
EQVEKNVQET 
GGHSLTRGDV 
KHTMFPKDWD 
RTAYPIYE* 



PFIRQVLDRQ 
QAAVEGNIGY 
WDGYEHHPAD 
TGQRLADRFH 
VKNIIGAAGE 
MAQLKDYAAA 
LGGITAHPVK 
LEQRYGKENI 
YDTKLDIQEL 
RRRSQSSQFK 
RVIQQTSAPD 
EARIRAEVTS 



HFEPDGKYHL 
IVRFSDHGHK 
GYDGPQGGGY 
NAGAMLTOGV 
IVGAGDAVQG 
AIRDWAVQNP 
RSQMGAIALP 
TSSTVPPSNG 
SGGGIPKAKP 
AHAQREWENK 
KHGVYQATVE 
AWESRIMLKD 



30 ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 



35 



40 



45 



50 



55 



60 



orf 4 6-1. pep 
orf 4 6ng-l 

orf 4 6-1 .pep 
orf 46ng-l 

orf 46-1 .pep 
orf 4 6ng-l 

orf 46-1 .pep 
orf 4 6ng-l 

orf 46-1. pep 
orf 4 6ng-l 



10 20 30 40 

AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
I 1 I I t I I I I t I III! I I I I I II I I I I I I I I II I I I I II I I I I I 
LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 
10 20 30 40 50 60 

50 60 70 80 90 100 

QS H I G LGKI QS HQLGNLM I QQAAI KGN IG Y I VRFS DHGHE VH S P FDNHASHS DS DEAGS P 
: : 1 1 1 1 I : 1 1 1 I ! I I : I I I I I I I : : I I I I 1 1 1 1 1 1 1 II I : I i It I I I I I 1 1 1 1 1 I I I I I 
NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 
70 80 90 100 110 120 

110 120 130 140 150 160 

VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
I I I I I I I I I I I II I I M I I I I I I I I II I I I I I M I I I I I I I I II I II I II I til I I I I I I 
VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
130 140 150 160 170 180 

170 180 190 200 210 220 

TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADI VKNIIGAAGE 
I I I I I I i i I I I I i: I M I II M I I I I II M I It I I I I M I I M M I I I Ml i Ml HI I I 
TGQR LADR FHN AGAMLTQG VG DG FKRATR Y S PE L DRSGN AAEAFNGTADI VKN 1 1 G AAGE 
190 200 210 220 230 240 



I 
I 

IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
250 260 270 280 290 300 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N. meningitidis: 
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20 



30 



40 



50 



60 



BNSOOaO. <WO 9924S7BA2J . > 



10 



30 



40 



50 



WOW/24578 

or £4 6a. pep 
orf 46ng-l 

orf 46a. pep 
orf 46ng-l 



FCT/IB98/01665 
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WISMCISLIWXUWCLMHAHAS^ 

10 20 30 40 

70 80 90 100 HO Hi 

• i i i i i i i i i i i i i • i • i I I I I : : I I I I I I I I 1 : 1 I I I M i I i ' ' i 1 1 1 1 i 
70 80 90 10° 



130 140 150 160 170 180 

VOGFSLYRIOTDGYEHHPADGYDGPQGGGYPAPKC^DIY^ 
,77,7 , , , , , , , , I . , , , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I It M 1 1 M II I I II 1 1 m II 1 

130 140 160 l'U iov; 

190 200 210 220 230 240 

20 o*.6»,-l ^U^i^^S^DK^^CT^IVKHnC^ 



orf 4 6a. pep 
15 orf46ng-l 

orf46a.pep 



25 orf 46a. pep 

orf46ng-l 



orf 46a. pep 



190 

250 260 270 2B0 290 300 

,ii 1 1 i i i i i I M 1 1 1 1 II 1 1 1 II 1 1 I I I M I I I II 1 1 I I I M I I M I 1 1 in 1 1 1 

IW3AGDAVQGISEGSNIAVI^ 

250 260 270 280 290 JUU 

-.10 320 330 340 350 360 

HAAQGIEAVSNIFTAVIPVKGIGAVRGKYG^^ 

orf46ng . x mmuu 

35 370 380 390 400 410 420 

ADAAYAKYPS PYHSRN IRSNLEQRYGKEN ITS 

ADAAYMCYPSPYHS^IRSNI£QRYGKENITSSTVPPSNGKNVK^ 

370 380 390 400 410 

430 440 450 460 470 

orf46a. P ep GFPNFEKDVKYDTRINTAVPQVN PIDEPVin<--PKGS^ 



orf 46a. pep 
orf 46ng-l 



45 orf 46ng-l 



GFPNFTFUIVKYDTKIjD — IQELSGGGIPKAKPVFDAKPRWEVDWCLN-KLTTREQVEKNV 
430 440 450 460 4/0 

• R0 490 500 510 520 530 

orf 46a .pep rqgrirYIPPKNYSPSAPLPKGPKNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 

orf46ng-l qetpj^qssqfwrewe^ 



The complete length ORF46a DNA sequence <SEQ ID 465> is: 



55 si 



60 



TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

si SSgatg catgcacacg cctcagattt ggcaaacgat tcttttatcc 

101 ggcaggttct cgaccgtcag catttcgaac ccgacgggaa ataccaccta 

ill TTCGGCAGCA GGGGGGAACT TGCCGAGCGC AGCGGTCATA TCGGATTGGG 

ll\ aaA^TACAA AGCCATCAGT TGGGCAACCT GTTCATCCAG CAGGCGGCCA 

HI ttaaaGGAAA TATCGGCTAC ATTGTCCGCT TTTCCGATCA CGGGCACGAA 

HI CTCCATTCCC CCTTCGACAA CCATGCCTCA CATTCCGATT CTGATGAAGC 

\l\ SgSSccc GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

Hi acgaacaSa tcccgccgac ggctatgacg ggccacaggg CGGCGGCTAT 

\ll CCCG^CCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCCT 

HI tcccSSat atccgcctca acctgaccga caaccgcagc accggacaac 

HI CCGTTTCCAC AATACCGGTA GTATGCTGAC GCAAGGAGTA 

65 III gScggat tcaaacgcgc cacccgatac agccccgagc tggacagatc 

HI Sgcaatgcc gccgaagctt tcaacggcac tgcagatatc gtcaaaaaca 

701 tIat^gcgc ggcaggagaa attgtcggcg caggcgatgc cgxgcagggt 

7 c, ATAAGCGAAG GCTCAAACAT TGCTGTTATG CACGGCTTGG GTCTGCTTTC 

70 HI CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 



WSOOC1D: <WO__9»457BA2J_> 
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851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

5 1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

10 1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

1401 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

14 51 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

15 i55i AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GG AAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence <SEQ ID 466>: 

1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 

20 51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 

101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

25 301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 

4 51 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 

501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 

30 551 GKITHK* 

Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 

of adhesins, it is predicted that the proteins from ^meningitidis and ^gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



35 Example 56 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

40 151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG. . . 

45 This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 
51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 

50 1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

5! GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCC TGCC TT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

55 251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 
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WO 99/24578 
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401 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

5 601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

JO 851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

90 1 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

15 110 1 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

20 1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATAR PIVN 

51 T.nvi.PAAT^LI ALPWRFVKIA G VLAFWLAVL FDGLMMVI Q L FPETMDLIGAI 

25 101 NLVPFILTAP APYO IMTGLL LLYMLAMPFV L QKAAAKTDF RHIAVCAAW 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

30 351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningi tidis (strain A) 
35 ORF48 shows 94.1% identity over a 1 19aa overlap with an ORF (ORF48a) from strain A of//. 
meningitidis: 

10 20 30 40 50 60 

orf48 pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATA RPIVNLDYLPAALLI 
IN III I III II MINI MUM Mill | | I I I I 11 I I I 1 I I II I I II I IIIIIIM 
40 orf48a MNIHTLL SKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATAR PIVNLXYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 119 

0rf48 pep ALPWRFVKIAG VIAFVnAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPYOIMT.GL 
45 MIM HI | M I I I I I I I I II I I I I I M 1 I I I I I I I M I I I M I I I I M I I I I I 
or f 4 8a ALPWRXWIXG VIJOCWLA\a,FDGU^I Q LFPFMDLIGAINLVPFI XTAPALYQI^GLL 
70 80 90 100 110 120 

orf48a LLYMLAM PFVLQKAAAKT D FRHIAACAAVWAAGY FTGHLSXYDRGRMAN I FGANN FYYA 

50 130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ID 471 > is: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

55 151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

60 401 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 
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501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



GATGGCCAAT 
CGATGCTCTA 
GTCGATCCCG 
TCTGAACGAG 
GGGGGCTGCC 
CTGGCGCAAA 
CATCGGCGCG 
GTTTGCGCGG 
TGCCTCCCCA 
CGGCGCGGGC 
GCTTTCAAGA 
GCCATTTTCG 
ANTTTTCAAA 
GCCACGCCGA 
ACCGAATATG 
GCACACCCAA 
TGAAAGGCAC 
AACCTCAATG 
GAACTTCAAA 



ATCTTCGGCG 
CACCGTCAGC 
TCTTCCTCCC 
CCGAAATCTC 
GGCCAATCCC 
AAGANCGTTT 
ACGATCGAAG 
GTTCGCACTG 
ACCGTTTGAA 
AGTTCGCTTT 
AATCAAAACC 
GCGGCGTGTG 
AAACACGACA 
CTATCCCGAA 
GCCTGCCCGC 
TTCTTCGACC 
GGAAGTCATC 
AAACCTTCCG 
ATCAAATAA 



CAAACAACTT 
CAGAATGCCG 
CTTGGGCAAT 
AAAAAATCCT 
GAACTTCAAA 
TTCGGTTTGG 
GCGAAATGCG 
CGCCGCGCGC 
ACAAGAAGGT 
ACGACCGCTT 
GCCGAAAACC 
CGACAGCGAG 
AGGGACTGTT 
TCNGACATTT 
CGAAACCGAC 
AACTGGCGGA 
ATCGTCGGCG 
CTACCTCAAA 



CTATTACGCC 
ACTTTATTAC 
CAACAGCGTG 
CTTTATCGTC 
ACGCCACTTT 
GAAAGCGGCA 
CGAACTGTGT 
CCGACGAAAA 
TACGCCACCT 
CAGCTGGTAT 
TGATCGGTAA 
CTGTTCGGCG 
TTACTGGATG 
TCAACCACAG 
NTCTGCCGCA 
TTTGATCCAA 
ACCATCCGCC 
CAGGGGCACG 



AAAAGTCAGG 
CGCCGGCCTG 
CCGCCACGCA 
GCCGAATCTT 
TGCCAAACTG 
GTTTTCCCTT 
GCCTACGGCG 
ATTTGCCCGC 
TTGCGATGCA 
CCGAGGGCGG 
AAAAACCTGC 
AAGTGTCGGC 
ACGCTGACCA 
GCTCAAATGC 
ATTTCAGCCT 
CGCCCCGAAA 
GCCCGTCGGC 
TCGNCTGGCT 



20 This encodes a protein having amino acid sequence <SEQ ID 472>: 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



LXYLPAALLI 
NLVPFIXTAP 
VAAGYFTG HL 
VDPVFLPLGN 
LAQKXRFSVW 
CLPNRLKQEG 
AIFGGVCDSE 
TEYGLPAETD 
NLNETFRYLK 



ALPWRXVKIX 
ALYQ IMTGLL 



G VLAXWLAVL FDGLMMVI QL 
LLYMLAMPFV LQKAAAKTDF 



MNIHTLLSKQ WTLPPFLPKR LLLSLL1LLX PNAVFWVLAL LTATA RPIVN 

~~ FPFMDLIGAI 

RHIAACAAW 
QNADFITAGL 
ELQNATFAKL 
RRAPDEKFAR 
AENLIGKKTC 
SDIFNHRLKC 
IVGDHPPPVG 



SXYDRGRMAN 
QQRAATHLNE 
ESGSFPFIGA 
YATFAMHGAG 
LFGEVSAXFK 
XCRNFSLHTQ 
QGHVXWLNFK 



IFGANNFYYA KSQAMLYTVS 
PKSQKILFIV AESWGLPANP 
TIEGEMRELC AYGGLRGFAL 
SSLYDRFSWY PRAGFQEIKT 
KHDKGLFYWM TLTSHADYPE 
FFDQLADLIQ RPEMKGTEVI 
IK* 



ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 
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10 20 30 40 50 60 

orf 4 8a . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 
I | 1 1 1 I I I I 1 I ! I t I I I I I I I I I I 1 1 I I I MMMMM1MM1MIM I1IIIIM 
orf 48-1 MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 8 a. pep ALPWRXVKIXGVIJOCWIAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 
Hill Ml mi llllllllllllllllllilllltMllll! Mil 1 1 i 1 1 1 I I 

orf 48-1 ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 48a . pep IXYHIJ^PFVl^KAAAKTDFRHIAACAAVVVAAGYFTGHLSXYDRGRMANIFGANNFYYA 
t | 1 I I I i I I i 1 I 1 I I I I 1 I I 1 I M : I 1 I t 1 : I 1 1 I I I I I 1 I imiiimiillllll 
or f 4 8 - 1 LLYMLAMPFVLQKAAAKT D FRHI AVCAAVVAAAGYFTGHLS YY DRGRMAN I FGANN FYYA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 48a. pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPAK? 

II! III! | II I IN I II II I IM II I M M I t I I I II Mill I I I I I I M II II II II II 
orf 48-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 48a . pep ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 

II || i | H | II I I I II I M I M I M M I M : M M M M I I M M M M M M I I I 11 I 
orf 48-1 ELQNAT FAKLLAQKDRFS VWESGS FPFIGATVEGEMRELCAYGGLRG FALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 48a. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

I I I 1 1 I I I 1 1 II I I I I I I M I M M M M I M M I 1 M M M I I I M I 

orf 48-1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 
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370 380 390 400 410 420 

or f 4 8a . pep LFGEVSAXFKKHDKGLFyWbfTLTSHADYPESDlFNHRI^CTEYGLPAETDXCBNFSLHTQ 

i ) I | | | | | I I t I I I 1 1 I I t I I 1 | i | | | | I I IM I MUM MUM I II IN i 

orf 48-1 LFGEVSAFFKKHDKGLFYWMTLTSHADY PE S D I FNHRLKCTE YGL PAET DLCRN FS LHTQ 

5 370 380 390 400 410 420 

430 440 450 460 470 

orf 48a . pep ffdqladliqrpemkgteviivgdhpppvgnlnetfrylkqghvxwlnfkikx 
IIIIMIIIIIIIilllllllllllMtlllMMMIIMMl I M I I 1 I I 
10 orf 4 8-1 FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLN FKIKX 

430 440 450 460 470 

Homology with a predicted ORF fxomN.zonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
15 gonorrhoeae: 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLIL1APNAVFWVLALLTATARPIVNLDYLPAALLI 60 

I I f I : I 1 I : I I I I I I I 1 1 M 1 1 1 I I I I II I I I M I M I I I I I I I I I! I 11 I I I I I M I I I 
orf48ng r^IHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

20 orf 48 .pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

I | I I | M I I I I I I M I I 1 1 I I I I I 1 I I I 1 t I I 1 I I 1 I 1 I I I I I I 1 I I 1 I I I ! 1 I I I I I 
orf48ng ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 

The ORF48ng nucleotide sequence <SEQ ID 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 

25 1 MNIHALLSEO WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFI LTAP APY QIMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG. . 

30 Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 

1 ATGAATATTC ACGCCCTGCT CTCCGAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

35 201 CAAAATTGCC GGCGTATTGG CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGACCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 

401 CCGCCGTCAA AACCGACTTC CGACACATTG CCGTCTGTGC CGCCGTTGTG 

40 451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCc aAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGgcctG 

601 GTCGACCCCG TCTTCCTCCC CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 

651 GCTGAGTGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

45 701 GGGGGCTGCC GGGCAATCCC GAGCTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAATTGTGC GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

50 951 CGGCGCGGGT AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAAA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

55 1201 ACCGAATACG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TtcttcgACC AACTGGCGGA TTTGATCCGA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGACACG TCGCCTGGCT 

1401 GCACTTCAAA ATCAAATAA 

60 This encodes a protein having amino acid sequence <SEQ ID 476; ORF48ng-l>: 
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1 MNIHALLSEQ 

51 LDYLPAALLI 

101 NLVPFILTAP 

151 AAAGYFTGHL 

201 VDPVFLPLGN 

251 LAQKDRFSVW 

301 CLPNRLKQEG 

351 AIFGGVCDSE 

401 TEYGLPAETD 

451 NLNETFRYLK 



WTLPPFLPKR 
ALPWRFVKIA 
APYQIMTGLL 
SYYDRGRMAN 
QQRAATRLSE 
ESGSFPFIGA 
YATFAMHGAG 
LFGEVSAFFK 
LCRNFSLHTQ 
QGHVAWLHFK 
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LLLSLLILLA 
GVLAFWPAVL 
LLYMLAMPFV 
IFGANNFYYA 
PKSQKILFIV 
TVEGEMRELC 
SSLYDRFSWY 
KHDKGLFYWM 
FFDQLADLIR 
IK* 



PNAVFWVLAL 
FDGLMMVIQL 
LQKAAVKTDF 
KSQAMLYTVS 
AESWGLPGNP 
AYGGLRGFAL 
PRAGFQKIKT 
TLTSHADYPE 
RPEMKGTEVI 



LTATARPIVN 
FPFMDLIGAI 
RHIAVCAAW 
QNADFITAGL 
ELONATFAKL 
RRAPDEKFAR 
AEKLIGKKTC 
SDIFNHRLKC 
IVGDHPPPVG 



ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 
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orf48-l .pep 
orf48ng-l 

orf 48-1 .pep 
orf48ng-l 

orf48-l.pep 
orf 48ng-l 

orf 48-1 .pep 
orf48ng-l 

orf48-l.pep 
orf48ng-l 

orf 48-1. pep 
orf 48ng-l 

orf 48-1 .pep 
orf 48ng-l 

orf48-l.pep 
orf 48ng-l 



10 20 30 40 50 60 

MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
I 11 I I I I I M M I I I t I I I I I I I M I I 1 I I III II I II I I M I M M I 
MN I HALLSEQWTLPPFLPKRLLLSLLI LLAPNAVFWVLALLTATARP I VNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

ALPWRFVKIAGV1AFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 iiiiiimMimiHiiiiMMiiMiiimiiiim 

ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

LLYMLAMPFVLQKAAAKTDFRH I AVCAAVV AAAGYFTGHL SYYDRGRMAN IFGANNFYYA 

i jMiimiimciii iiiiMiimiiiiiiimiHimiiiiMi 

LLYMLAM^FVLQKAAVKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMAN IFGANNFYYA 
130 140 150 160 170 180 

190 200 210 220 230 240 

KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

| 1 | | 1 I I I I i ] 1 I I I I 1 I I I I I I I I M I I III II I I: I: I I I I I I II I I I I Ml I I 1:1 I 
KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 
190 200 210 220 230 240 

250 260 270 280 290 300 

ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

I | | | | | | | | I || II II II I I II I M M II II I I II I I I I M II I M I M I I M M II I II 
ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

CLPNRLKQEG YATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

I 1 | | | | | | | | | | I I 1 I I I I I I I I I I I M I I I I II I I : I I I M I I I ! I ! I I I II I M II I I 
CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

LFGEVSAFFKKHDKGLFYWMTLTSHADYPE5DIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

| | | | | | || | 1| || II II II II II II M M I I ! II I II I M I M I I I I I II I M I M 11 I I 
LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

| | | | | | | | | : I II II II M M II I I II I I M II I 1 1 M I II I II I I I : M I II 
FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 

430 440 450 460 470 



60 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 57 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 477>: 

1 . . GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

5 101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 

10 351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

401 TGATCAATAT GTACGCC . . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 

1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 
15 101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 479>: 

1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

20 151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

25 401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

30 651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

35 901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

40 1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 

1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQI ALI I 

45 51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 S KIITVTLSI AT1AAAGIAM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

50 301 frrW DGYAR AIAEPVRLLR GKPKTGNAE F FAWN I WV AGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM N ALALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 
55 ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A of N. 
meningitidis: 
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orf53a 
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10 20 30 

VSGRYRAT.npwQW TTTVTLSIATLAAAGIA 

i 1 1 1 1 1 1 1 1 1 M | I 1 1 1 M I M 1 1 1 1 1 1 1 1 

AAIV TOIAIPSI^ FDAGWAATTMASCLIILVSGRYRALDRVSKTTTVTLSIATLAAAGIA 
110 120 130 H6 150 160 

40 so 60 70 80 90 

M cpr:MrwnsnFTRPTPWTT.ti f:T^FLlAI^GWMPA PIEISAINSLWVTEKQRINPSEYRDG 
I | | M Ml HI I II III ||H I I Ml I I Ml I I I HMMI t I I I I I M Ml I I I I III I 
orf53a M c P r.NinMn^nFTFPTPMTT.& r,T^FLlALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
170 180 "190 200^ 210 220 

100 110 120 130 139 

orf53 dgd IFEFNVGY IASAVLALVFLALGXV APNGNGXTVQMAGGKYKGQLIWMYA 

I till II 111 II 1 1 II I Ml II : IN :llllllll I MUM I 
orf53a I FDFNVGYIASAVIJVLVFLALGAJVQYGNGEAVQMAGGKYIGQLIN 

230 240 250 260 270 280 

Q ^ f53a AFIAFACMYGTTITW DGYARAIAEPVRLLRGKDKTGKAEFFAWN IWVAGSGLAVI FWFD 

290 300 310 320 330 340 

The complete length ORF53a nucleotide sequence <SEQ ID 481 > is: 

1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACG7T ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

55 1 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

60 i ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGA^TACCG TTGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTCAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This encodes a protein having amino acid sequence <SEQ ID 482>: 

1 MSEOHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLFKYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAM SRGMQMQSDF IEPTP WTLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFVQYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITVV DGYAR AIAEPVRLLR GKDKTGNAS F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF AW LNYRLVKG DEKHKLTSGM N ALALAGLIY 

4 01 LTGFTVLFL L NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1: 

10 20 30 40 50 60 

orf53a Deo mSEQHISTWKSKINALGPGIKMASAAVGGSHLIASTQAGALYGWQIALIIILTNLFKYPF 

P M M M II 1 1 1 1 I II I M 1 1 I M I I I I I HUM II I Mil Mi Mill 

orf53-l mSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALIIILTNLFKYPF 
10 20 30 40 50 60 
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orf 53a. pep 
orf53-l 

orf53a.pep 
orf53-l 

orf53a.pep 
orf53-l 

orf53a.pep 
orf53-l 

orf53a.pep 
orf53-l 

orf 53a. pep 
orf53-l 



FRFSAHYTLDTGKSLIEGYAEKSR^LWVFLILCILSATI^GAVAIVTAAIVKMAIPSL 

IIMMIMMI I Mill MMtlllllHIIIt II II I Mil 1,11 1 iiliJ 

FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 

70 80 90 100 HO 120 

130 140 150 160 170 180 

MFDAGTVAALIMASCLI I LVSGRYRALDRVSKI 1 1 VTLS I ATLAAAGI AMSRGMQMQSDF 

lllllllf IIMMIM M I II M I M I H I I M I II M IN 11 I I M I I HI I 

MFDAGTVAALIMASCLI I LVSGRYRALDRVSKI 1 1 VTLS I AT LAAAG I AMSRGMQMQSPF 

130 140 150 160 170 180 

190 200 210 220 230 240 

IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

M | | | t | | I I M I II I I I II I I I II I I I I I I I I I I I I II M I I I I I I I II I II MM I I I 
IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 

250 260 270 280 290 300 

AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 
M I M 1 I M I M I I I M I I M I M II It I I I I I ! 1 M I I M I I M I I I I M M I M M 1 I 
AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

250 260 270 280 290 300 

310 320 330 340 350 360 

T I T WDGYARAIAE PVRLLRGKDKTGNAE FFAWN I WVAGS GLAV I FWFDGVMANLLKFAM 
U U II M I M M I I I I I t 1 i I I 11 M 1 M I I M I 1 I f t M 1 I 1 1 M I t M M I I I I I I I 
T I T WDG YARAI AE PVRLLRGKDKTGNAE FFAWN I WVAGS GLAV I FW FDG VMANLLKFAM 

310 320 330 340 350 360 

370 380 390 400 410 

IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

I | | | | | | II I I M I M I II I i M I I I I I I M I II I I I I I I I I II M I I I III I I I I I I 
IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

370 380 390 400 410 



Homology with a predicted ORF from N. gonorrhoeae 

ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 



40 



45 



50 



gonorrhoeae: 

orf 53. pep 
orf 53ng 
orf 53. pep 
orf53ng 
orf 53. pep 
orf53ng 



VSGRYRALDRVSKI 1 1 VTLS I AT LAAAG I A 
I I I I I I II I I I I I I I I I t I 1 1 I I I I I M I I 
AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIA 

MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 

IIMIIIt 1 I I I 1 k I 1 I 1 1 I i I I I I t I 1 I I t 1 1 I t I I I t I I 1 1 1 I 1 I ! 1 1 I I I 1 

MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 

IFEFNVGY IASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQL INMYA 

I I : I I 1 1 I I I I 1 ! 1 I I I ! M I I : til :III:UII I 11 II I M 
IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 
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91 



90 



151 



139 



211 



55 



60 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
acid sequence <SEQ ID 484>: 

1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP S LMFDAGTVA 

51 ALIMASCLII LVSGRYRALD RVSK IIIVTL SIATLAAAGI A MSRGMQMQP 

101 DFIEPTP WTL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRPGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ID 485>: 

• 1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 
51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 
101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 



BNSOOCJD: <WO__992457aA2_l_> 



WO 99/24578 



PCT/IB98/01665 
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15 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
■701 
751 
801 
851 
901 
951 
1001 



ATTATGGCAT 

GGATCGTGTT 

CCGCCGCCGG 

ATCGAGCCGA 

GATGGGCTGG 

GGGTAACCGA 

TTCGATTTCA 

CCTTGCACTG 

TGGCGGGCGG 

ATCGGCGGCT 

GTACGGCACG 

AACCCGTGCG 

TTtgccTGGA 

GTTTGACggc 

TTGTGTCCGC 

GACAAACGCC 

CCTGCTCTAC 

GACTTTTGGC 



CCTGCCTGAT 

TCCAAAATCA 

CATCGCTATG 

CACCGTGGAC 

ATGCCCGCGC 

AAAACAACGC 

ACGTCGGTTA 

GGCGCGTTTG 

CAAATATATC 

GGTCTCGTCC 

ACGATTACCG 

CCTGCTGCGC 

ATATTTGGGT 

gcaaTGGCgG 

CCCTGTGTTC 

ACAGGCTTAC 

CTGGCCGGGT 

AT AG 



TATTTTGGTG 

TCATTGTTAC 

TCGCGCGGTA 

GCTTGCCGGT 

CGATCGAAAT 

ATCAATCCTT 

TATCGCcagT 

TGCAATACGG 

GGGCAATTGA 

GCTGGTGGCG 

TTGTGGACGG 

GGCAGGGATA 

GGCGGGCAGC 

AACtgcTCAA 

GCCTGGCTCA 

CGCCGGTATG 

TTGCCGTTTT 



AGCGGACGTT 

TTTGAGCATC 

TGCAGATGCA 

TTGGGCTTCC 

TTCCGCCATC 

CTGAATACCG 

GCGGTTTTGG 

CAACGGCGAA 

TTAATATGTA 

TTTATCGCGT 

TTATGCGCGT 

AAACCGGCAA 

GGTTTGGCGG 

ATTTGCGATG 

ACTACCGCCT 

AACGCCCTTG 

GTTCCTGTTG 



ACCGCGCTTT 
GCCACGCTTG 
GCCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
TGCCGTAACC 
7TGCCTGTAT 
GCCATTGCCG 
CGCCGAGTTG 
TGATTTTCTG 
ATtgccgcCT 
CGTCAAAGGG 
CCATTGTCGG 
AACCTTACCG 



This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 



20 



25 



i 

51 
101 
151 
201 
251 
301 



KKSCVYLWVF LILCIASATI N AGAVA I VTA AIVKMAIPSL 
IMASCLIILV SGRYRALDRV S KIIIVTLSI ATLAAAGIAM 



TPPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR 
rnrNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI 
Tr^WSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR 
FAWN I WV AGS GLAV1FWFDG AMAELLKFAM IAAFVSAPVF 



MFDAGTVAAL 
SRGMQMQPDF 
INPSEYRDGI 
GQLINMYAVT 
GRDKTGNAEL 
AWLNYRLVKG 



rwPHPT.TAftM NA LAIVGLLY LAGFAVLFL L NLTGLLA* 



ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 
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orf 53-1. pep 
orf 53ng-l 



orf53-i -pep 
orf53ng-l 



orf 53-1. pep 
orf53ng-l 



orf 53-1. pep 
orf 53ng-l 



orf 53-1. pep 
orf53ng-l 



orf 53-1. pep 
orf 53ng-l 



60 70 80 90 100 HO 

ILTNLFKYPFFRFSAHYTLDTGKSLlEGYAEKSRVYLWFLILa 

KKS CVYLWV FL I LC I AS AT I N AGA V AI VTA 
10 20 30 

120 130 140 150 160 170 

AI VKMAI PS LMFDAGTVAALIMASCLI I LVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 

aU^Upsi^fdagtvaalimasclii^ 

4 0 50 60 70 80 



90 



180 190 200 210 220 23C 

srgmqmqpdfieptpwtlagi^fliai^gwmpapieisainsl^ 

100 no !™ 130 140 150 



120 



240 250 260 270 280 290 

fdfnvgyiasavlalvflalgafvqygngeavqmaggkyigqlikmyavtiggwsrplva 
fd^gyi^avlalvflau^evqygngeavqmaggkyigqlinmy 

160 170 180 190 200 1 ' 



210 



300 310 320 330 340 350 

FIAFACMYGTT ITWDGYARAI AEPVRLLRGKDKTGNAEFFAWK IWVAGSGLAVI FWFDG 

i i i tin i 1 1 1 1 1 1 1 1 1 1 1 u 1 1 1 1 1 1 1 1 : n n 1 1 1 : n 1 1 1 1 1 1 1 m i nmm 

FIAFACMYGTTITWDGYARAIAEPV^^ 

220 230 240 250 260 270 



400 



410 



360 370 380 390 

VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 

. ,, . , I m | I ) | | I | 1 ! I I I M I I ! I I I I I I : : I : I I : HI I I I • • ' I : M : II : I t M I 

^mjUli^famiaafvsapvfawlnyrlvkgdkrhrltag^ 

280 290 300 310 320 330 



65 



orf 53-1. pep NLAGMFKX 

I I : I : : 
orf53ng-l NLTGLLAX 



BMSOOOD. <WO 992*578A2_L> 



WO 99/24578 PCT/1B98/01665 

-286- 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N .meningitidis and N. gonorrhoeae, and their epitopes, could be 
5 useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 58 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 487>: 



10 



a 

51 
101 
151 
201 
251 
301 



. TTGCGGGAAA 
TGCGCTTGCC 
TGCGCGAGGT 
CTGCCTGAAA 
GCTTTTCCAC 
TCCGTTTCTG 
GTTCCGCCT. . 



CGGCATATGT 
GGCTTGTTTT 
TTCTGCGTGG 
TCAAAGACGG 
GCCGTCAAAA 
CCGAAACTAT 



TTTGGATAGT 
TTGTCCGCGC 
CAGGAAAAGA 
TATGCCCGAT 
CGGCAGTGTA 
CTGGCGCACG 



TTTGATCGTT 
ACAATCCGAA 
AAGGGGAAAA 
TTTCCCGAAC 
TTGGCTGTTT 
AATCCGAACC 



ATTTTGTTGT 
CGCGAGTGGA 
ACAGGCGGAG 
TTGCCCTGAT 
GTCGGTGTCG 
GGACAGGCCC 



15 This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWALA GLFFVRAQSE REWMREVSAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALML FH AVKTAVYWLF VGW RFCRNY LAHESEPDRP 
101 VPP. . 



Further work revealed the complete nucleotide sequence <SEQ ED 489>: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGTTTTGGA 

GTTTTTTGTC 

CGTGGCAGGA 

GACGGTATGC 

CAAAACGGCA 

ACTATCTGGC 

GCAAACCGTG 

AAACGGGACG 

AGGCTGCCGA 

ATCCCATTCG 

AATTTCGCCC 

CGCGTGCTTT 

GATGCATTTG 

TACCCCGATG 

AACGCACGTA 

TCTGCGGATT 

CTTTTCTGCA 

ATGCAGGGCA 

CAAGGGCAGT 

CGTTTCCGTC 

CGCGAATTTC 

GATGTCGAAA 

GTCTGTGGGA 

TTGAAGAACC 

CCGAAAGTTC 

GGAAATCTAC 

AACGCAGCCG 

GGAGGTTGGC 

TGCGGCAGAG 

GGCATGACAG 

CCGTCCTGCC 

TGAAGAAACC 

TGCCTCCGCT 

GAAAACAGCA 

CAAGGTTGTC 

AACCCGATGT 

TTGGCGCGTT 

CGGCAAAACC 



TAGTTTTGAT 

CGCGCACAAT 

AAAGAAAGGG 

CCGATTTTCC 

GTGTATTGGC 

GCACGAATCC 

CGGATGTTCC 

GAAGAAGCGG 

TACGGAAGAC 

ACCGGAGTAT 

GTCCGTCCGG 

AAACAGCGCG 

AGAAAAACGA 

GAAGGGCTGC 

TTCCCATATG 

ACGGATTTGA 

GTCAAAGCCG 

GGGGAAAGGG 

CCGTTTCAGA 

AATTTGAAAG 

TCGCCTGATT 

TGCCGTCTGA 

TACGGCGGTC 

TGCCGCGCCC 

CCATGACCGC 

AACCGTACCT 

CATTGCCGAG 

AGGAGGAAAC 

CGGTCAAGCG 

TCAGGCGGTT 

GGGTATCGGA 

GGTGCGGTAT 

GTTCAATCCC 

TCACCATCGA 

GATTCTTATT 

CGGCGTGCGC 

CGCTCGGCGT 

TGCATGGGTT 



CGTTATTTTG 

CCGAACGCGA 

GAAAAACAGG 

CGAACTTGCC 

TGTTTGTCGG 

GAACCGGACA 

GACCGCATCC 

AAACGGAAGA 

ATTGCAACTG 

TGCTGAAGGG 

TTTTTAAAGA 

GCTTTAAGGG 

AACAGCGGTC 

AGATTATCGG 

TTCGATGCGG 

GCCGTATTTT 

AAAATGCACG 

CAGGCGGAGG 

CGGCACGGCC 

AACCGAACAA 

CCGGAAAGTC 

AACCGAAAAT 

CGGTTTATGA 

GATGCTTGGG 

AATCGATATT 

ATGAACCGCC 

ACCGACCATC 

CGCCGCTATT 

GGCAATATCT 

TGTCCGTTTG 

TACGGAAGCG 

CCGAACACCT 

GAGGCGACGC 

AGAAAAATTG 

CCGGCCCCGT 

GGCAATTCCG 

GGCTTCCATC 

TGGAACTTCC 



TTGCTTGCGC 

GTGGATGCGC 

CGGAGCTGCC 

CTGATGCTTT 

TGTCGTCCGT 

GGCCCGTTCC 

GACGGATATT 

AGCAGAAGCT 

CCGTAATCGA 

TTGATGCCGT 

AATCACTTTG 

AAACGAAAAA 

CCCAAAGTCC 

TTTGGACGAC 

ACAAAGAAGC 

GAGAAGCAGC 

GAATGCGCCG 

CAAAATCCCC 

GTCCGCGATG 

GGCAACGGTT 

AGACGGTTGT 

GTTTTCACGG 

TGAAACTGCC 

TGGTCGAACC 

CAGCCGCCGC 

GTCAGGATTC 

TTGCCGATGA 

GCGGATGACG 

GTCGGAAACC 

AAAATGTGCC 

GATGAAGGGG 

GCCGACAACC 

AAACCGAAGA 

GCGGAGTTCA 

AATTACGCGT 

TTCTGAATCT 

CGCGTTGTCG 

GAACCCGAAA 



TTGCCGGCTT 

GAGGTTTCTG 

TGAAATCAAA 

TCCATGCCGT 

TTCTGCCGAA 

GCCTGCTTCT 

CAGACAGTGG 

GCGGAGGAAG 

CAACCGCCGC 

CTGAAAGCGA 

GAAGAAGCAA 

ACGCTATATC 

GCGTGTCCGA 

CCTGTGCTTC 

GTTTTCCGAG 

ATCCGTCTGC 

TTCCACCGTC 

GGATGTTTCC 

CCCGCCGCCG 

TCTGCGGAGG 

CGGGAAACGG 

AAACCGTTTC 

GATATCCATA 

ACCCGAAGTG 

CTCCCGTATC 

GAGCAGGTGC 

TGTTTTGAAT 

GCAGTGAAGG 

GAAGCGTTCG 

GTCTGAACGC 

CGTTCCCATC 

GACCTGCTTC 

AGAACTGTTG 

AAGTCAAGGT 

TATGAAATCG 

GGAAAAAGAT 

AAACCATCCC 

CGCCAAATGA 



BNSDOClD: <WO__992457BA2J,> 
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1901 
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2001 

2051 

2101 

2151 

2201 

2251 

2301 

2351 

2401 

2451 

2501 

2551 

2601 

2651 

2701 

2751 

2801 

2851 

2901 

2951 

3001 



PCT/1B98/01665 



TACGCCTGAG 
AAGCTGACGC 
CGACTTGGGA 
GCAAATCGGT 
GCGCCGGAAG 
GAGCATTTAC 
TGAAGCTGGC 
CGCTACCGCC 
TCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCATTTGAT 
CTGATTAAGG 
AATCGACAGC 
GTCAGGGCGA 
GTTCACGGCG 
TTTGAAACAG 
GCGGCAGCGA 
GATCCGATGT 
CAGCATTTCG 
CGCGTCTGAT 
CACAACGGCA 



CGAAATCTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCC 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCGACGAT 
TCGATGAGTT 
CTGATTGCCC 
TCTTGCCACA 
CGAACATCCC 
CGCACGATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAAC 
AGAGCTGCCC 
ACGACGAGGC 
GGCGTACAGC 
TGACCAGATG 
ACCGTACGAT 



-287- 

AATTCGCCCG 
TCAGGACATC 
ATTTGTTGGT 
GCGATGATTC 
GATTATGATC 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGACCTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TCGACCAAAT 
CTGCTGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGCATCGGGC 
CGTATCCGT? 
GCGCCTTGCG 
GAGGCGGAAG 
TCTCGTCCCC 



AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGTAATCTTG 
AGAAAAAATC 
TGGAAAAACT 
ATGATGACGG 
AAAAGCCCGC 
GCGTCGATGT 
GCGTTCCAAG 
GGGCGCGGAA 
GTACTGCCTA 
GTGCACCGCG 
TGACGATATT 
GCAGCGGCGA 
GTCCTGAAAA 
TATCGGCTAC 
GCATTGTGTC 
TTGGACAATG 



ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
GTTACCGATA 
AATGGAAAAA 
CGGGCTTCAA 
GGCAATCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGACGAAACC 
CGCGCAAAGC 
AACCGCGCCG 
CGCACCGGAA 
CTTGA 



This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 
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101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
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801 
851 
901 
951 
1001 



M FWIVLIVIL LLALAGLFFV RAQS EREWMR 
"~" LMLFHAVKTA VYWLFVGWR 



DGMPDFPELA 
ANRADVPTAS 
IPFDRSIAEG 
DAFEKNETAV 
SADYGFEPYF 
QGQSVSDGTA 
DVEMPSETEN 
PKVPMTAIDI 
GGWQEETAAI 
PSCRVSDTEA 
ENSITIEEKL 
LARSLGVASI 
KLTLALGQDI 
APEDVRMIMI 
RYRLMSFMGV 
WWDEFADL 



LIKANIPTRI 
VHGAFASDEE 
DPMYDEAVSV 
HNGNRTILVP 



DGYSDSGNGT 
LMPSESEISP 
PKVRVSDTPM 
EKQHPSAFSA 
VRDARRRVSV 
VFTETVSSVG 
QPPPPVSEIY 
ADDGSEGAAE 
DEGAFPSEET 
AEFKVKVKW 
RWETIPGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGFNQKI 
MMTAGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
LDNA* 



EEAETEEAEA 
VRPVFKEITL 
EGLQIIGLDD 
VKAENARNAP 
NLKEPNKATV 
YGGPVYDETA 
NRTYEPPSGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGI PHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDI 
GVQRALRIGY 



EVSAWQEKKG 
FCRNYLAHES 
AEEEAADTED 
EEATRALNSA 
PVLQRTYSHM 
FHRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGKDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDD 
AAGIHLILAT 
NLLGQGDMLF 
LSGGGSEELP 
NRAARLIDQM 



EKQAELPEIK 

EPDRPVPPAS 

IATAVIDNRR 

ALRETKKRYI 

FDADKEAFSE 

QAEAKSPDVS 

PESQTWGKR 

DAWWEPPEV 

TDHLADDVLN 

CPFENVPSER 

EATQTEEELL 

GNSVLNLEKD 

NSPEFAESKS 

AMILSMLFKA 

LNWCVNEMEK 

PEPLEKLPFI 

QRPSVDVITG 

LLPGTAYPQR 

GIGRSGDDET 

EAEGIVSAPE 



Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 

Homology with a predicted ORF from N. ^pnin^itidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A of N. 



50 meningitidis: 
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orf58.pep 
orf 58a 

orf58.pep 
orf58a 



10 20 30 40 

LRETAYVLDSFDRYFWALA^Fn^RAQSEREWMREVSAWQEKKGEKQAELPE I KDGMPD 

MFWI VLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 
" Jo 20 30 40 50 

70 80 90 100 

rprTALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 
MINI 111 I 1 I 1 I I I I I I I I I I I I I I I I M M H I I I I I I I I 

60 To 80 



90 



100 



110 



BNSOOCfD: <WO 992*57BA2J_ > 



WO 99/24578 



PCT/IB98/01665 
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The complete length ORF58a nucleotide sequence <SEQ ID 491> i s: 
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l 

51 
101 
151 
201 
251 
301 
351 
40i 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



ATGTTTTGGA 
GTTTTTTGTC 
CGTGGCAGGA 
GACGGTATGC 
CAAAACGGCA 
ACTATCTGGC 
GCAAATCGTG 
AAACGGGACG 
AGGCTGCCGA 
ATCCCATTCG 
AATTTCGCCC 
CGCGTGCTTT 
GATGCATTTG 
TACCCCGATG 
AACGCACGTA 
TCTGCGGATT 
CTTTTCTGCA 
ATGCAGGGCA 
CAAGGGCAGT 
CGTTTCCGTC 
CGCGGATTTC 
GATGTCGAAA 
GTCTGTGGGA 
TTGAAGAACC 
CCGAAAGTTC 
GGAAATCTAC 
AACGCAGCCG 
GGAGGTTGGC 
TGTGGCAGAG 
GGCATGACAG 
CCGTCCCGCC 
TGAAGAAACC 
TGCCGCCGCT 
GANAACAGCA 
CAAGGTTGTC 
AACCCGATGT 
TTGGCGCGTT 
CGGCAAAACC 
TACGCCTGAG 
AAGCTGACGC 
CGACTTGGGC 
GCAAATCGGT 
GCGCCGGAAG 
GAGCATTTAC 
TGAAGCTGGC 
CGCTACCGCC 
TCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCATCTTAT 
CTGATTAAGG 
AATCGACAGC 
GGCAGGGCGA 
GTTCACGGCG 
TCTGAAACAG 
GTATGTCCGA 
GATCCGATGT 
CAGCATTTCT 
CGCGTCTGAT 
CACAACGGCA 



TAGTTTTGAT 
CGCGCACAAT 
AAAGAAAGGG 
CCGATTTTCC 
GTGTATTGGC 
GCACGAATCC 
CGGATGTTCC 
GAAGAAGCGG 
TACGGAAGAC 
ACCGGAGTAT 
GTCCGTCCGG 
AAACAGCGCG 
AGAAAAACGA 
GAAGGGCTGC 
TTCCCGTATG 
ACGGATTTGA 
GTCAAAGCCG 
GGGNAAAGGG 
CCGTTTCAGA 
AATTTGAAAG 
GCGCCTGATT 
TGCCGTCTGA 
TACGGCGNTC 
TGCCGCGCCC 
CCATGCCCGC 
AACCGTACCT 
CATTGCCGAA 
AGGAGGAAAC 
CGGTCAAGCG 
TCAGGCGGTT 
GGGCATNGGA 
GGTGCGGTAT 
GTTCAATCCC 
TCACCATCGA 
GATTCTTATT 
CGGCGTGCGC 
CGCTCGGCGT 
TGTATGGGTT 
CGAAATCTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCC 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCGACAAT 
TTGATGAGTT 
CTGATTGCCC 
CCTTGCCACA 
CGAACATCCC 
CGCACGATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAAC 
CGATTTGCTG 
ACGACGAGGC 
GGCGTGCAGC 
TGACCAGATG 
ACCGTACGAT 



CGTTATTTTG 
CCGAACGCGA 
GAAAAACAGG 
CGAACTTGCC 
TGTTTGTCGG 
GAACCGGACA 
GACCGCATCC 
AAACGGAAGA 
ATTGCAACTG 
TGCTGAAGGG 
TTTTTAAGGA 
GCTTTAAGGG 
AACAGCGGTC 
AGATTATCGG 
TTCGATGCGG 
GCCGTATTTT 
AAAATGCACG 
CAGGCGGAGG 
CGGCACAGCC 
AACCGAACAA 
CCGGAAAGTC 
AACCGAAAAT 
CGGTTTATGA 
GATGCTTGGG 
AATNGATATT 
ATGAACCGCC 
ACCGATCATC 
CGCCGCTATT 
GGCAATATTT 
TGTCCGTTTG 
TACGGAAGCG 
CCGAACACCT 
GGGGCGACGC 
AGAAAAATKG 
CCGGCCCCGT 
GGCAATTCCG 
GGCTTCCATC 
TGGAACTTCC 
AATTCGCCCG 
TCAGGACATC 
ATTTGTTGGT 
GCGATGATTC 
GATTATGATC 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGACCTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TTGACCAAAT 
CTGCCGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGAATCAGCC 
CGTGTCNGTT 
GCGCATTGCG 
GAGGCGGAAG 
TCTCGTCCCC 



TTGCTTGCGC 
GTGGATGCGC 
CGGAGCTGCC 
CTGATGCTTT 
TGTCGTCCGT 
GGCCCGTTCC 
GACGGATATT 
AGCAGAAGCT 
CCGTAATCGA 
TTGATGCCGT 
AATCACTTTG 
AAACGAAAAA 
CCCAAAGTCC 
TTTGGACGAC 
ACAAAGAAGC 
GAGAAGCAGC 
GAATGCGCCG 
CNAAATCCCC 
GTCCGCGATG 
GGCAACGGTT 
GGACGGTTGT 
GTTTTCACGG 
TGAAACTGCC 
TGGTCGAACC 
CCGCCGCCGC 
GGCAGGATTC 
TTGCCGATGA 
GCGAATGACG 
GTCGGAAACC 
AAAATGTGCC 
GATGAAGGGG 
GCCGACAACC 
AAACCGAAGA 
GCGGAGTTCA 
GATTACGCGT 
TTCTAAATCT 
CGCGTTGTCG 
GAACCCGAAA 
AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGCAATCTTG 
GGAGAAAATC 
TGGANAAATT 
ATGATGACGG 
AAAAGCCCGC 
GTGTCGATGT 
GCGTTCCAAG 
GGGTGCGGAA 
GTACGGCCTA 
GTGCACCGCG 
TGACGATATN 
GGAGCGGCGA 
GTTTTGAAAA 
TATCGGCTAT 
GCATTGTGTC 
TTNGACAATG 



TTGCCGGCTT 
GAGGTTTCTG 
TGAAATCAAA 
TCCATGCCGT 
TTCTGCCGAA 
GCCTGCTTCT 
CAGACAGTGG 
GCGGAGGAAG 
CAACCGCCGC 
CTGAAAGCGA 
GAAGAAGCAA 
ACGCTATATC 
GCGTGTCCGA 
CCTGTGCTTC 
GTTTTCCGAG 
ATCCGTCTGC 
TTCCGCCGTC 
GGATGTTTCC 
CCNGCCGCCG 
TCTGCGGAGG 
CGGGAAACGG 
AAANTGTTTC 
GATATCCATA 
ACCCGAAGTG 
CTCCCGTATC 
GAGCAGGTGC 
TGTTTTGAAT 
GCAGTGAGGG 
GAAGCGTTCG 
GTCTGAACGC 
CGTTCCAATC 
GACCTGCTTC 
AGANCTGTTG 
AAGTCAAGGT 
TATGAAATCG 
GGAAAAAGAN 
AAACCATCCT 
CGCCAAATGA 
ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
GTTACCGATA 
AATGGAAAAA 
CGGGTNTCAA 
GGCAACCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGGCGAAACC 
CGCGCAAAGC 
AATCGCGCCG 
CGCACCGGAA 
CTTGA 



This encodes a protein having amino acid sequence <SEQ ID 492>: 
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1 MFWIVLIVIL LIALAGLFFV RAQS EREWMR EVSAWQEKKG 

51 DGMPDFPELA LMLFHAVKTA VYWLFVGW R FCRNYLAHES 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED 

151. IPFDRSIAEG LMPSESEISP VRFVFKEITL EEATRALNSA 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM 



EKQAELPEIK 
EPDRPVPPAS 
IATAVIDNRR 
ALRETKKRYI 
FDADKEAFSE 
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10 



15 



251 SADYGFEPYF 

301 QGQSVSDGTA 

351 DVEMPSETEK 

401 PKVPMPAXDI 

451 GGWQEETAAI 

501 PSRRAXDTEA 

551 XNSITIEEKX 

601 LARSLGVASI 

651 KLTLALGQDI 

701 APEDVRMIMI 

751 RYRLMSFMGV 

801 WWDEFADL 

851 LIKANIPTRI 

901 VHGAFASDEE 

951 DPMYDEAVSV 

1001 HNGNRTILVP 



EKQHPSAFSA 
VRDAXRRVSV 
VFTEXVSSVG 
PPPPPVSEIY 
ANDGSEGVAE 
DEGAFQSEET 
AEFKVKVKW 
RWETILGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMT AGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
XDNA* 



VKAENARNAP 
MLKEPNKATV 
YGXPVYDETA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



FRRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



QAEAKSPDVS 

PESRTWGKR 

wDAWWEPPEV 

TDHLADDVLN 

CPFENVPSER 

GATQTEEXLL 

GNSVLNLEKX 

NSPEFAESKS 

AMILSMLFKA 

LNWCVNEMEK 

PEPLXK LPFI 

QRPSVDVITG 

LPPGTAYPQR 

GISRSGDGET 

EAEGIVSAPE 



ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 
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10 20 30 40 50 60 

or f 58a. pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
I I I I I I I I I II II It I I I I I I I I I I I II I II i I I I I I I I I I I I I I I I I I I i I I I I II I I I 
or f 56-1 MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

10 20 30 40 50 60 
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30 



35 



40 



70 80 90 100 110 120 

or f 58a . pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

I I 1 I I I M I I I M I I I I I I I I I I I I I i i t I I M I I I I I I I I I I i I I I I I I I I I I I I I 1 I I 
orf58-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58a . pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
orf 58-1 EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58a . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
I I I I I I I I II I I I I I I I I I I I I I I I I I ! II I II I I I I I I I I II I I I I I I I I I I I I I I I : I 
orf 58-1 EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLORTYSHM 
190 200 210 220 230 240 
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250 260 270 280 290 300 

orf 58a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I 1 I I I I I 
orf 58-1 FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 58a . pep QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 
I I I I I I I I I II I I I I I II I I 1 I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I 
orf 58-1 QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 

310 320 330 340 350 360 
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60 
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70 



370 380 390 400 410 420 

orf 58a . peD VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDI PPPPPVSEIY 

iiiciiiiMi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i ii mini ii 

orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 58a . peo NRTYEPPAG FEQVQRSRIAET DHLADDVLNGGWQEETAAIAN DGSEGVAERSSGQYLSET 
II I I I II : I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I : I I II I : I I II II I II II I 
orf 58-1 NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58a . pep EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
I 1 I I I I I I I I I I 1 I I I I I I I I I I: MINIMI 1 I f I I I ! I I 1 I I I I I I 1 I I I | 1 1 I 
orf 58-1 EAFGHOSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTT DLLLPPLFNP 

490 500 510 520 530 540 



BXSOOCIO <WO 992457aA2J_> 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



WO 99/24578 



PCT/1B98/01665 



-290- 



orf58a.pep 
orf58-l 

orf58a.pep 
orf58-l 

orf58a.pep 
orf58-l 

orf58a.pep 
orf58-l 

or f 58a. pep 
orf58-l 

orf58a.pep 
orf58-l 

or f 58a. pep 
orf5B-l 



580 590 S00 

710 



610 
670 



620 
680 



630 
690 



640 
700 



660 
720 



ft 680 690 /uu /AU 

670 680 



EGIPH 
I II i 

egip; 



740 750 760 770 780 

•HLLAPVVT DMKLAANALNWCVNEMEKRYRLMS FMGVRNLAGXNQKI AEAAARGEKI 



750 
810 



Ton 800 810 820 830 840 

G KPFSLTPDNPEPI*KLPFIVVWDE^ 

E%ei*s 



p-cn B60 870 880 890 900 

QRPSVDVITGLIKT^IPTRIAFQVSSKIDSRTILDQMGAENLLGQGD^^PPGTAYPQR 



850 

oin 920 930 940 950 960 



860 
920 



870 
930 



880 
940 



890 
950 



900 
960 



Q70 980 990 1000 1010 

CT.rn.nl.wV with » p^Hii-rwt QPF ft"T N WSOhSeU 

ORF58 shows complete identity over a 9aa overlap with a predicted ORT (ORF58ng) from N. 
gonorrhoeae: 

art 58 .pep ALMLFHAVKTAVYWLFVGVVRFCRNYLAHESE PDRPVPP 103 

SE PDRPVPPAS ANRADV PT AS DGYSDSGNG 30 
orf58ng . • - ^- i 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 



i 

51 
101 
151 
201 
251 
301 



crpnRPVPPA S ANRADV PT A S DGYSDSGNG TEEAETEAAE AAEEEAADTE 
DIATAVIDNR RIPFDRSXAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

SSStSy idafekngta vpkvrvswp meglqugld DPVLQRTYSR 
^ESs esadygfepy fekqhpsafs avkaenarna pfrrhagqek 

SOGOSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 
IPESRTVVGK SsS SvHTETVSSV GYGGPVYOEA gjggg* 
PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 
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351 ETDHLAADVL NGGWQEETAA IADDGSEGAA ERSSGQYLSE TEAPGHDSQA 

401 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

4 51 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRWETIPGK TCMGLELPNP KRQMIRLSEI 

5 551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHT.T.v ar: TTGSGKS VGV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCVNEME KRYRLMSFMG VRNLAGFNQK IAEAAARGEK IGNPFSLTPD 

701 DPEPLEK LPF IWWDEFAD LMMT AGKKIE ELIARLAQKA RAAGIHLILA 

7 51 TQRPSVDVIT GLIKANIPTR IAfgVSSKID SRTILDQMGA ENLLGQGDML 

10 801 FLPPGTAYPQ RVHGAFASDE EVHRWEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKASI SGVQRALRIG YNRAARLIDQ 

901 MEAEGIVSAP EHNGNRTILV PLDNA* 

This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
15 homologous to the FTSK cell division protein of £. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRVVET 526 
+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
VEARLADFRIKADVVNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRVVEV 927 

IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 
IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

L V AGTTG S GKS VG VN AM I L SML FKAAPE D VRM I M I D PKMLE LSI YEG I T H LLAP WT DMK 64 6 
LVAGTTGSGKS VGVNAMI LSML+KA PEDVR IMI DPKMLELS+YEGI HLL WTDMK 
LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

IAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 704 
AAKAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 





ORF58ng: 


4 67 


20 


FtsK: 


868 




ORF58ng: 


527 




FtsK: 


928 


25 


ORF58ng: 


587 




FtsK: 


988 


30 


ORF58ng: 


647 


FtsK: 


1048 




ORF58ng: 


705 


35 


FtsK: 


1108 




ORF58ng: 


763 


40 


FtsK: 


1168 




ORF58ng: 


823 




FtsK: 


1228 


45 


ORF58ng: 


883 




FtsK: 


1287 



L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
PVLKKEPY I WLVDE FADLMMTVGKKVEELI ARLAQKARAAG IH LVLATQRPSVDVI TGL 1 167 

IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 822 
IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 
H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 921 
VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
VQRQFR I GYNRAARI I EQME AQG I VS EQGHNGNREVLAP 1325 

Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

50 51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

55 301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

60 551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
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801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

c 100 1 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

uol SSgtSS tacggcggtc cggtttatga tgaagctgcc gatatccata 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 
1201 ccIgaggtag CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 
10 125 SSSSi? ScCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

1301 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 
1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 
1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 
1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 
i < 15 oi CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

lJ 1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 
1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 
1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 
90 1151 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

^ 18 0i TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 
1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 
1951 AAGCTGACGC TCGCGCTCGG TCAGGACATT ACCGGACAGC CCGTCGTAAC 
2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 
2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 
2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 
2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 
2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 
™ 2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

3U 2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 
2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 
2451 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 
2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 
° 2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 
2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 
2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 
40 2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 
2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 
2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 
2951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 
45 3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 

X wraTV LIVIV LLAIAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 nftMPDFPEFS LMLFHAVKTA VYWLFVGW R FCRNY LAKES EPDKPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEAAEA AEEEAADTED IATAVIDNRR 

<0 151 IPFDRSIAEG 124QSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

201 DAFEKNGTAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

301 OGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWWEPPEV 

401 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 

501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

fT\ 651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGITHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RMLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEKLPFI 

801 WWDE FADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

65 901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 

95! DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 

ORF58ng-l and ORF58-1 show 972% identity in 1014 aa overlap: 
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70 



80 



180 



130 140 150 160 170 

EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

iiiiM | | | | | | | I | | | | | || I i I I I I II I I I I I I I I I I I I HI: I I I I 1 1 I I I 1 1 I 
EEAETEAAEAAEEEAADTE DI ATAVI DNRRI PFDRS IAEGLMQSESKTS PVRPVFKE ITL 
140 150 160 170 180 



130 

190 200 210 220 230 240 

EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 

I M I I I I : I I I I 1 I I M I I I I I I 1 1 I I i M I I I I I I I I I I I I H M I I HI I I I I I I: I 
EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

190 200 210 220 230 240 

250 260 270 280 290 300 

FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

i M I I I | I I M 1 I I I I I I II I I ! II I M i I I H M I I I I I I : I I M I 111111111111 
FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 

260 270 280 290 30C 



250 



360 



310 320 330 340 350 

QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLI PESQTWGKRDVEMPSETEN 
Y , | t M i | 1 1 M I I I M I 1 t 1 I M t M I ! M I I M I I I I I I I I : I I I f I t I M I I I I I U 
OGQS VS DGT AVRDARRRVSVN LKE PNKATVS AEARI SRU PE SRT WGKRDVEMPS ETEN 

310 320 330 340 350 360 



370 380 390 400 410 420 

VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 

ii iii illinium in i mm mi i mmiiiii:! : 1,1 m,,l 'ii 

VFTETVSSVGYGGPVYDEAABIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 
380 390 400 410 420 



370 

430 440 450 460 470 480 

NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

| | | | | | I : I I I I : I 1 I I M I I 1 I I I I I I I I I J I • I I I I I H I I I I I I I I I I I 1 I I I I I I 
NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 



490 500 510 520 530 540 

EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFN? 

TumiiimimiiiimimNiiim imiimmmnimm 

EAFGHDSQAVCPFEDVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
490 500 510 520 530 540 



550 560 570 580 590 600 

EATOTEEELLENSITIEEKLJ^EFKVKVKWDSYSGPVITRYSIEPDVGVRGNSVLNLEKD 
lY Ml 1 1 M 1 1 1 1 1 1 1 1 I I M M I i i 1 1 M 1 1 1 1 1 1 1 1 1 II M I I l I I I 1 I 1 1 1 1 1 1 i 1 1 
^T^EEELLENSITIEEKLAEFKVKVKWDSYSGPV 

610 620 630 640 650 660 

LARS LGVAS IRWET I PGKTCMGLELPN PKRQMI RLSEI FNS PE FAESKSKLTLALGQDI 
I I 1 I M 1 1 1 I 1 I II I t 1 t I 1 I I t I I I M I I I I I M 1 I I M M 1 I I I I I I I t I I 1 1 I I U I 
IARSL^ASIRVVETIPGKTCMGLELPKPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

TGQPWTDLGKAPttLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

1 1 1 1 1 1 1 1 1 1 1 1 n i m 1 1 MHiiiiiimimmmmii m niiinm 

TGQPV\TTDLGKAPKLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
670 680 690 700 710 720 
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730 740 750 760 770 780 

orf58-l pep EGIPHLLAPmDMKIAANALNWCVTOMEKRYR^ 

III II N n | | | | I II I 1 I I M I I I II I | H | | | | | | u U I I I I I II I I I I II 

orf58ng-l egithliapvvtdmklaanalnwcwemekryri^sfmgvrniagfnqkiaeaaargbki 

730 740 750 760 770 7 80 

790 800 810 820 830 840 

or f 58-1 . pep GNPFSLTPDDPEPIXKLPFIVVWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I I I II I I 1 I I I 1 II | || I I I I I I I M I I M I M I I I I I I I II I I I I I II 1 I I II I I I U I 
orf58ng-l GNPFSLTPDDPEPLEKLPFIWVVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 

850 860 870 880 890 900 

orf 58-1. pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 
I I 1 1 t I J I I I 1 1 t I 1 1 I 1 1 1 f 1 1 I 1 1 1 1 i i ! S I I 1 I | I I 1 | 1 1 I I I I 1 1 I 1 It I I M I I 
orf58ng-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAEN1XGQGDMLFLPPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

orf 58-1 . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 illinium 

orf58ng-l VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 
910 920 930 940 950 960 

970 980 990 1000 1010 

orf 58-1 . pep VLKTRKASISGVQRALRIGYKRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 
I I I II 1 1 I I M I I I 1 1 I I I I 1 1 1 1 II 1 1 II I I I I I I I I I I I I I 1 1 I I I I I 1 1 1 1 I 
orf58ng-l VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 
970 980 990 1000 1010 



30 Furthermore, ORF58ng-l shows significant homology to the E.coli protein FtsK: 

sp|P4 6889|FTSK_EC0LI CELL DIVISION PROTEIN FTSK >gi 1 1651412 I gnl J PID|dl015290 (Dl 
division protein FtsK [Escherichia coli] >gi 1 1651418 |gnl I PID|dl015296 (D90727) Cell 
division protein FtsK {Escherichia coli) >gi 1 1787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
35 Score m 576 bits (1469), Expect =• e-163 

Identities = 301/459 (65%), Positives - 353/459 (76%), Gaps ~ 5/459 (1%) 
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Based on this analysis, it is predicted that the proteins from ^meningitidis and gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 59 

The following partial DNA sequence was identified in N.meningitidis <SEQ DD 497>:- 

5 1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

' 51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC. .GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C . - 

a TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

10 H\ CCTGCTTTTT gISgTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

."I rrrTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

losi SSS3X ^Sgcca gcccttctgg caggcggttg gcaaaagtct 

1101 GACATTGAAA GGCGGAAAAT GA 

15 This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 

1 MIYQRM.IKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 

51 ALVGFWV - 

301 IAIGLFL IYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 

2Q 351 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 499>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

A CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

xll -GCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

o< III ttStSgc? TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

25 \t\ CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

25i SSSSS CTGGCTATCC tgcggattgg cattgaaaca atggatacgc 

III CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 
III GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 
in 401 tStgaagca gaagcaggaa TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

30 451 agtttSgca agcgcaacgg cagggtttat tttgtcgaaa ccttcgatac 

tni CGAATCCGGC atcatgaaaa acctgttcct gcgcgaacag gacaaaaacg 

GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

111 aaSgX cgctcgaatt gcgccacggc taccgttaca gcggcacgcc 

,c ll\ CGGACGCGCC GACTACAATC AGGTTTCCTT CCAAAAACTC AACCTGATTA 

35 tl\ SScCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 

ill cSScCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

III GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

fts} CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

4ft III ttgIttS tcggtttgtt tttaatttac caaaacgggc tgaccctgct 

40 III TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTA-GCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

llll StatgS GCCAGCCCTT ctggcaggcg GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

45 This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 

1 MTYORNLIKE LSFTAVGIF V VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

,J Sv iS PLLLVLTAFI STLTVLTRYW K DSEMSVWLS CGLALKQWIR 

1 PVMOFAVPFA VLVA VMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

ill SLG CTNGRVY FVETFDTESG 1 MKNLFLRE0 DKNGGDNIIF AKEGNFSLND 

c n 201 N^TLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

50 HI B55£5 PQHQAJELMWR TSI.TVSVLLL CLLAVPL SYF NPRSGHTYNI 

III n^r.LFLIY QNGLTL LFEA VEDGKIHFWL GLLPMH1IMF AVALILLRVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted ORF from N. meningitidis (strain A^l 

ORF101 shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
an ORF (ORFlOla) from strain A of TV. meningitidis: 

10 20 30 40 50 

or f 101 . pep MI YQRNLIKELS FTAVGI FWLLAVLVSTQAINLLGRAADGXVI AI DAVLALVGFWVX - 

I I I II II I I I I I I I I I I t I I I I I I I II I I I I I I I II III I | I I II I I I I I I I I 
orflOla MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 
10 20 30 40 50 

// 

90 100 110 

orf 101 . pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

I I I M I I I I II I I II I I I I I I I I I I I I I I I 
orf 101a LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 . 290 300 310 320 330 

120 130 140 150 

orf 101. pep LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 
III I I III t:!::MMM Milll I I! HI III Mill! 
orf 101a LPMHIIKFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 

The complete length ORFlOla nucleotide sequence <SEQ ID 501> is: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCN TGCCGCCGAC NGGCGTNTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCNN NNGNATGACG CCGCTTTTGC TNGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGNGACAGCG 

251 AAATGTCGGT CTGGNTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGGGTTCAAC 

451 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC NCCAAAGAAA GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCNAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACNATN 

751 CCNACNGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC ANGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGANTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT CATCATGTTC GTCATCGCAA TCGTACTTCT GCGCGTCCGC 

1051 AGCATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 502>: 

1 MIYQRNLIKE LS FTAVGI FV VLLAVLVSTQ A INLLGXAAD XRXAIDAVLA 

51 LVGFWVXXMT PLLL VLTAFI STLTVLTRYW RDSEMSVWXS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF XKESNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 

251 PTAQLIGSSN PQHXAEUWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOla and ORF101-1 show 95.4% identity in 371 aa overlap: 

orf 101a . pep MI YQRNLI KELS FTAVG I FWLLAVLVSTQAI NLLGXAADXRXAI DAVLALVGFWVXXMT 60 

I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I t I I I III I J } I I I | t I I I 1 I I il 
orf 101-1 MI YQRNL I KE LS FTAVG I FWLLAVLVSTQAINLLGRAADGRVAI DAVLALVGFWVIGMT 60 

orf 101a .pep PLLLVLTAFI STLTVLTRYV?RDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 

I I I I I I I N I I I I I I I I I I I I I I I I I I I I I N I I I I I I I I I I I I I I I I I | | | | | | | | | | 
orfl01-l PLLLVLTAFI STLTVLTRYWRDSEMS VWLSCGLALKQW IRPVMQFAV PFAVLVA VMQLWV 120 
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orflOla.pep 
orfl01-l 
orflOla.pep 
orfl01-l 
orflOla.pep 
orfl01-l 
orf iOla.pep 
orf 101-1 



IPWAEIJISREYAEILKQKQELSLVEAGGFNSLGKRNGRVYFVETFDTESGI 1B0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 H 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 II I 1 1 MINI IHM I 1 1 1 

IPWAELRSREYAE1LKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

DKNGGDNI I FXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVS FXKLNLI I STTPKL 2 4 0 

I M 1 t I I I I I 11:11 II II IN III Nil II llllll I litllt II Mi II I MM II 
DKNGGDNI I FAKE GNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQK1NLIISTTPKL 2 

IDPVSKRRTXPTAQLIGSSNPQHXAELMWRISL7VSVLLLCLLAVPLSYFNPRSGHTYNI 300 

I I 1 | | ! I I I mm I I MM I Mill MIMIIMMIIMM MIIMMM MM 

I DPVSHRRTI PTAQLIGSSNPQHQAEIMWRI SLT^ * 300 

LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 

I ||imMMIMIIMMMMMMMMmMM:M::IMMMMMMM 

LI AI GLFLI YQNGLTLLFEAVEDGKI HFWLGLLPMHI IMFAVAL ILLRVRSMPSQP FWQA 360 



VGKSLTLKGGK 371 
III I I Ml Ml 
VGKSLTLKGGK 371 



orflOla.pep 
orfl01-l 

Homology with a predicted ORF from N gonorrhoeae 

ORF101 shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 



25 gonorrhoeae: 



30 



35 



orflOl-pep 



orflOlng 



orf 101. pep 
orflOlng 
orf 101. pep 
orflOlng 



M T YQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWV 

milllllllllllllllMllllllllllMIMIIIM i MIMMMMM 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 

// 

IAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 
i I I I I M III M I M M I I M M II 1 I IN 
SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAI GLFLI YQNGLTLLFEAVEDGKI HFWLG 

LLPMHIIMFVLALILLRVRSMPSQPFWQA VGKSLTLKGGK 

IN | Ml I 1 1 :\ : : M M i M I I Ml M I M 
LLPMHI IMFVIAIVLLRVRSMPSQPFWQAVG 



57 
59 



333 
331 



373 
362 



The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 



40 amino acid sequence <SEQ ID 504>: 



45 



1 MIYORNLIKE LSFTAVGIFV 

51 LVGFWVIGMT PLLL VLTAFI 

101 PVMO FAVPFA ILIAVMQLWV 

151 NLGKRNGRVY FVETFDTESG 

201 NKRTLELRHG YRYSGTPGRA 

251 STAQLIGSSN PQHQAELMWR 

301 LIAIGLFLIY QNGLTLLFEA 

351 SMPSQPFWQA VG 



VLLAVLVSTQ 
STLTVLTRYW 
_IPWAELRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNI IF 
NL I I STTPKL 
CLLAVPLSYF 



VEDGKIHFWL GLLPMHIIMF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



Further work revealed the complete nucleotide sequence <SEQ ID 505>: 



50 



55 



60 



1 ATGATTTATC 

51 CATTTTCGTC 

101 TGCTTGGCCG 

151 TTAGTCGGCT 

201 CGCATTCATC 

251 AAATGTCGGT 

301 CCCGTCATGC 

351 GCTTTGGGTG 

401 TTTTGAAGCA 

451 AACTTGGGCA 

501 CGaatccgGC 

551 gcggcgacaA 



AAAGAAACCT 
GTCCTCTTGG 
CGCAGCTGAC 
TCTGGGTCAT 
AGCACGCTGA 
CTGGCTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 



CATCAAAGAA 
CGGTGTTGGT 
GGGCGTGTCG 
CGGTATGACC 
CCGTATTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTGCG 
TTGTCTTTGG 
CAgggtttaT 
ACCTGTtcct 
GCcaaaGAag 



CTCTCTTTTA 
GTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CGTTGAAACA 
ATCCTGATTG 
CAGCCGCGAA 
TGGAAGCCGG 
TtcgtcgaaA 
GcGCGAACAG 
graactTctc 



CCGCCGTCGG 
GCGATCAACC 
CGTGTTGGCC 
TGGTGTTGAC 
CGCGACAGCG 
GTGGATACGC 
CCGTCATGCA 
TATGCCGAAA 
CGAGTTCAAT 
CCTTTGACAC 
GACAAAAACG 
gctgaaggaC 
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601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



AACAAAcgca 
CGGacGCGCc 
TCAGCACCAC 
tcgacCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGgcgGA 



cgctcgaATT 
gactaCAATC 
GCCCAAacTT 
AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



GCGCCACGGC 
AGGTTtCCtt 
ATCGaccCCG 
CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



TACCGTTACA 
cCAAAAacTc 
TTTCCCACCG 
CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



GCGGcacgcC 
aacctgATta 
CCGCACCATT 
AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 506; ORFlOlng-1- 
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20 



l 

51 
101 
151 
201 
251 
301 



MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 



LVGFWVIGMT PLLL VLTAFI 
PVM QFAVPFA ILIAVMQLWV 
NLGKRNGRVY FVETFDTESG 
NKRTLELRHG YRYSGTPGRA 
STAQLIGSSN PQHQAELMWR 
LIAIGLFLIY QNGLTLLFEA 



STLTVLTRYW 
IPWAELRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



351 SMPSQPFWQA VGKSLTLKGG 



VEDGKIHFWL 
K* 



GLLPMHIIMF 



GRVAI DAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



ORFlOlng-1 and ORF101-1 show 97.6% identity in 371 aa overlap: 
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45 



50 



55 



60 



orf 101-1 .pep 
orf 101ng-l 

orf 101-1 .pep 
orf 101ng-l 



orf 101-1 .pep 
orf 101ng-l 

orf 101-1. pep 
orf 101ng-l 

orf 101-1 .pep 
orf 101ng-l 



orf 101-1. pep 
orf 101ng-l 

orf 101-1. pep 
orfl01ng-l 



10 20 30 40 50 60 

MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 
I I I I I I I I I I I I I I I I I I I I I II II II I I I I II I I I I I I I I It I II I I I I I I If I I I I I 1 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

10 20 30 40 50 60 

70 80 90 100 110 120 

PLLLVLTAFI STLTVLTRYWRDS EMSVWLS CGLALKQW I RPVMQFAVPFAVLVAVMQLWV 
I I I I I I I I I I I I I I I II I II I I I I I I I M I I I I I I I II I I I I I I I I I I I I : I : I I I I I i I 
PLLLVLTAFI STLTVLTRYWRDS EMS VWLSCGLALKQW I RPVMQFAVP FAIL I AVMQLWV 
70 80 90 100 110 120 

130 140 150 160 170 180 

IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 
I I I I I I I It I I II I I I I I II II I I I I I I I I : I I I I I I I I I i II I I I II I I I ! I I I I I I I I 
I PWAELRSREYAEILKQKQELSLVEAGEFNNLGKRNGRVY FVETFDTESG IMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

DKNGGDN 1 1 FAKEGNFSLNDNKRTLELRHG YRYSGTPGRADYNQVS FQKLNLI ISTT PKL 
I M I I I I I I I M I I I I t I: I I I I I I I I I I I I I I I I I I I i I II I I I I M I I I I 1 I I I I I I I 
DKNGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 

190 200 210 220 230 240 

250 260 270 280 290 300 

IDPVSHRRTI PTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II t I I I I I II I II I I I I I I I 
IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYHI 

250 260 270 280 290 300 

310 320 330 340 350 360 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 
I I I I I I I I I I I I I I I I I I I I I t I I I I I I I t I I I I I I I I II: : I: : t I I I I I I I II I I I I I 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

VGKSLTLKGGKX 
I t I I I I I I I I I I 
VGKSLTLKGGKX 
370 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 507>; 

<: 1 GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

5 5 '"gcaatatcaa GCAGGAGACC ttagcgcttt taagataagg caaggcaatg 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

10 251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

IU 301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORF1 13>: 

. c , GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

lJ 51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 

101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 
Hnmnlftgy with with pspA putative se c reted nrotein of N.menwyitidis (accession AF030941) 
20 ORF and pspA show 44% aa identity in 1 79aa overlap: 

orfU3 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa gggliNAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

25 orfU3 PVWGQDVRWAGQNDVMTGDAHSPILXXXXXXXXXXXXXXGTHIPLFAIDTGKLGGMYA 120 
VWG+DV+W+G+N + G + p AIDT LGGMYA 

pspa GVWGKDVKWSGKNKLDFDG SIAKTASAPSSSDSVTPTVAIDTATLGGMYA 307 

orfll3 NKITL T STVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 179 
+KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 



30 



45 



pspa 



Hnmnlo pv with a predicted ORF from Ai gonorrhoeae 

ORF1 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
35 overlap at the C-terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

OZt11 ^ ill lilt) illll::IIMMI:l:im 

SHPSQLNGYIEVGGRRAEWIANPAGI AVNGGGFINASRATLTTGQPQYOAGDFSGFKIR 2 2 4 



orfX13ng 



40 orfl!3 QGNWI AGHGLDARDTDYTRI LS YH S KI DAPVW GQDVRWAGQN DVAATG DAHS P I LNNA 

I II: I I Mill II I Ml: I II I _ _ 

or f 1 1 3ng qgNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 



90 



~, , , IDTGKLGGXVCQQNHLDQYGRASRHS 135 

° rt113 |||lllllllll:IIM 
orfll3ng DFSGFKIRQGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ED 510>: 

ca 1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 

51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
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101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARWVN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

5 Based on this analysis, it is predicted that these proteins from N .meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 61 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 51 1>: 

1 . .TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

10 51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

15 301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

4 51 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

20 551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTSCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

25 801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT. . 

This corresponds to the amino acid sequence <SEQ ID 512; ORF1 15>: 

30 1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

35 251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF03094O 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 

40 OrfllS: 1 STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGISLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
STG YSRS P YE PAPE VS -SI RMG I S AYKG YAPQQAS DI PGT W PWAENG I H PT FT 831 

PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 
45 LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++RKRLGDGYYEQ+ 

-LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 



55 



OrfllS: 


1 


pspA: 


778 


OrfllS: 


61 


pspA: 


832 


Orfll5: 


121 


pspA: 


891 


Orfll5: 


181 


pspA: 


951 



LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 
L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
50 pspA: 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 



WLVQKEVKLPDGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGTIAG 239 
WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G I AG 
WIXNETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGSVVDIG-SGAIENRGGLIAG 1009 
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Orf 115: 240 RNALIINTDTLDNIGGRIHAQKSAVTATQDINNIGGMLSAEQTLLLNAGXXXXXXXXXXX 299 

R ALI+N +N+G++ ADING+ae LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGKNIFAAAGSDITNTGS-IGAENALLLKASNNIESRSETRS 1068 

5 OifH5: 300 XXXXXXXXXYLDRMAGIYITGKEKG 324 

+ R+AGIY+TG++ G 
pspA: 1069 NQNEQGSVRNIGRVAGIYLTGRQNG 1093 

Homology with a predicted ORP from N.^nnnrrhoeae 
10 ORF1 15 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORF1 15ng) from 
N. gonorrhoeae: 



15 



orf 115 . pep STGHSEQNYTLPREITRNISLGSFAYESHRK 31 

I II I I I I I II : I I I I : I M I I I I I I ) I I 

orfl!5ng NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 71 

orf 115. pep ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 81 

I I I : I I I II II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I : I I I II I I I 

orfll5ng ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 

20 orf 115 . peo DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 

I II I I I I I I I I I I I I I ! I I I ! I I I I I I M I I I I i I I 1 I I I II I M I I I I I I I I I I I I I I 

orf!15ng DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

orf 1 15 . pep EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 
25 I 1 I I I I I I I I I I I I I 1 I I I I I I I 1 I I II I I : I I I I I I I II I II I I I I I I II I I I I I I : I I 

or f 1 1 5ng EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

orf 115. pep VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 2 61 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 f 1 1 1 1 i 1 1 1 i I i 1 1 1 1 1 1 1 J 1 1 1 1 1 1 1 

30 orfll5ng VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

orf 115. pep SAVTATQDINNIGGMLSAEQTLLLNAGNHINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 

I I I I I I I I I I I I I I : I I I I I I I I I II I I I I I : I I I : I I I 1 : I 1 I I I I 1 I M I I I 1 I I ! I 

orf!15ng SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 371 
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orfll5.pep EKGV 325 
I I I I 

orf!15ng EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 431 



An ORF1 1 5ng nucleotide sequence <SEQ ID 5 13> was predicted to encode a protein having amino 
40 acid sequence <SEQ ID 5 14>: 

1 MLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

45 201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

50 451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 OAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

55 701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following partial gonococcal DNA sequence <SEQ ED 5 15>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

60 151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 
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301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



TCGCTACCCT 
ATACATTATC 
GCTTTGCCAA 
CTCAAACTAG 
CGAGCAACGT 
GTTTAGACGG 
AATGGCGCGA 
AAGTGCCGAG 
AAAAAGAAGT 
CAGGTTTATG 
GTTGTCAGGC 
CAGGCACGAT 
GACAATATCG 
ACAAGACATC 
TGCTCAATGC 
CAAAATGCAC 
TATCACAGGC 
TCAACATCAT 
CGGCTGCAGG 
ATATCAAGAA 
CGAACGAAGT 
TCAGGGAATA 
CACACTTGCC 
ATGCCGGCCA 
GGTAATAAAT 
TCAAAGCAGC 
ATGCCAACAT 
CAAGCAGGCA 
CGAAACCTAT 
GCTTCACTAT 
AACGAACATA 
TGTTGCAAGC 
AGGGCAACAA 
AACCAATTAA 
GGTGGCATTC 
TAGCACACAA 
ATGCCATGGC 
GGCGCACAAA 



ATACGCCCAA 
AATCCTGCCA 
CTACCGTCAA 
ACCCAAACAA 
TTAATCAATG 
TTATCAAAAC 
CTGCGGCACG 
CAAGCAGCGC 
TAAACTTCCT 
TACGCGTTAA 
AGCAATACAC 
TGCAGGGCGC 
GTGGGCGTAT 
AATAATATTG 
GGGTAACAAC 
AAGGTAGCAG 
AAAGAAAAAG 
TGCCGGTCAA 
CAGGACGCGA 
ATCCATTTTG 
CGGCAGCAGC 
ATCTCAATGC 
GTGTATGCTA 
AGTTGATGAT 
TAGTCATTAC 
ACCTTTGAAG 
CCTTGGCAGT 
ATCATGTTCG 
CATCAAACCC 
TGGCAGCAAG 
CAGGCAGTAC 
AAACACTACG 
CCTTATCAGC 
ACAGCAAAAC 
AGTTCGCCCG 
AGCAGCAAAC 
GGCTGCCAAT 
ACTTAG 



TTCTTTTACC 
ATAAAGGCTA 
TGGTTGGGTA 
TTTACATAAA 
AACAAATCGC 
GACGAAGAAC 
TTCGATGAAT 
AACTGACCAG 
GATGGCGGCA 
AAATGGCGGC 
AAATCAATGT 
AATGCGCTTA 
TCATGCGCAA 
GCGGCATTCT 
ATCAACAACC 
CACCTACCTA 
GTGTTTTAGC 
ATCAGCAATC 
CATTAACCTG 
ATGCCGATAA 
ATTCAAACAA 
CAAAGCTGCC 
AAAATGACAT 
GCGTCCAAAC 
CGATAAAGCC 
GCAAGCAAGT 
AATGTTATTT 
CATTGGTACA 
AAAAATCAGG 
ACAAACACAC 
CGTAGGCAGC 
AACAAACCGG 
ACGCAAAGTA 
CACCCAAACC 
TTACCGATTT 
AAGTCGGACA 
GCAGGTTGGC 



CCATTACCCG 
TCTTGTTGAA 
GTGACTATAT 
CGTTTGGGTG 
AGAGCTGACA 
AATTTAAAGC 
CTCAGCGTTG 
CGATATTGTT 
CACAAACCGT 
ATAGACGGTA 
TTCAGGCAGC 
TTATCAATAC 
AAATCAGCGG 
TTCTGCCGAA 
AAAGCACGGC 
GACCGAATGG 
AGCGCAGGCA 
AATCAGATCA 
GATACGGTAC 
CCATACCATC 
AAGGCGATGT 
GAAGTCGGCA 
TACTATCAGC 
ATACAGGCAG 
CAAAGTCATC 
TGTATTGCAG 
CCGATAATGG 
ACCCAAACTC 
ATTGATGAGT 
AAGAAAACCA 
CTGAAAGGCG 
CAGCAACGTT 
TGGATATTGG 
TACGAACAAA 
GGCACAACAA 
AAGCAAAAAC 
AGGCCTATCA 



GCAGCAGCTT 
ACCGATCCAC 
GCTGGGCAGC 
ATGGTTATTA 
GGGCATCGTC 
CTTAATGGAT 
GCATTGCATT 
TGGTTGGTAC 
ATTGATGCCA 
AAGGTGCATT 
CTGAAAAACT 
CGATACGCTA 
TTACGGCCAC 
CAGACATTAT 
CAAGAGCAGT 
CAGGTATTTA 
GGCAAAGACA 
AGGGCAAACC 
AAACCGGCAA 
CGAGGTTCAA 
TACCCtatTG 
GCGCAAAAGG 
TCAGGCATCC 
AAGCGGCGGC 
ACGAAACTGC 
GCAGGAAACG 
CACCCGGATT 
AAAGCCAAAG 
GCAGGTATCG 
ATCCCAAAGC 
ATACCACCAT 
TCCAGCCCTG 
CGCAGCACAA 
AAGGCTTAAC 
GCGATTGCCG 
GACCGCGTTA 
AACAGGCAAA 



This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



LLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 

LGSFAYESHS 

PLPGSSLYII 

RLGDGYYEQR 

LSVGIALSAE 

IDGKGALLSG 

KSAVTATQDI 

DRMAGIYITG 

DTVQTGKYQE 

EVGSAKGTLA 

QSHHETAQSS 

TQTQSQSETY 

LKGDTTIVAS 

YEQKGLTVAF 

RPIKQAKAHK 



VFSENGKLHN 

KALSRHAPSQ 

NPANKGYLVE 

LINEQIAELT 

QAAQLTSDIV 

SNTQINVSGS 

NNIGGILSAE 

KEKGVLAAQA 

IHFDADNHTI 

VYAKNDITIS 

TFEGKQWLQ 

HQTQKSGLMS 

KHYEQTGSNV 

SSPVTDLAQQ 

T* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAN 



ETGHREQNYT 

NIRTAKSNGI 

WLGSDYMLGS 

DEEQFKALMD 

DGGTQTVLMP 

NALIINTDTL 

1NNQSTAKSS 

ISNQSDQGQT 

IQTKGDVTLL 

ASKHTGRSGG 

NVISDNGTRI 

TNTQENQSQS 

TQSMDIGAAQ 

KSDKAKTTAL 



This gonococcal protein (ORF115ng-l) shows 91.9% identity with ORF115 over 334aa: 

20 30 40 50 60 70 

orfll5ng-l .p NEQTFGEKKVFSENGKU1NYWRARRKGHDETC 



orfll5 



Ml Ml II I I: I I ii: IN MUM I I 
STGHSEQNYTLPREITRNISLGSFAYESHRK 
10 20 30 



120 



130 



orfll5ng-l.p 
orfllS 



80 90 100 110 

ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 

YM-mimiiiiii m 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 = 1 1 1 n 1 1 1 

ALSHHAPSQGTELPQSN 

40 



GISLPYTSNSFTPLPSSSLYIIKPVNKGYLVET 
50 60 70 80 



WO 99/24578 



PCT/IB98/01665 



-303- 



10 



140 150 160 170 180 190 

Orf 115ng-l .p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHEIRLDGYQND 

llimiMM | limilliliMMIII 1 | 1 1 1 I I I I I I I I I I I I 1 

orf 115 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
90 ioo HO 120 130 140 

200 210 220 230 240 250 

orfll5ng-l.p EEQFKAI^DNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 
1 1 1 1 1 I I 1 I I 1 1 I I 1 1 1 1 1 1 1 1 1 I I I I I 11:1 M I I I I I 1 1 1 | | | | | | V I I I I IIIIHI 
orf 115 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 



15 



260 270 280 290 300 310 

orf 115ng-l . p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
I I I II I II I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I II II I I I I I I I 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

210 220 230 240 250 260 



20 



25 



30 



35 



40 



320 330 340 350 360 370 

orf 115ng-l.p SAVTATQD INN IGG I LS AEQTLLLN AGNN IN NQSTAKS SQNAQG S ST YLDRMAG I YITGK 
I I I i II I I I II M I : I I I I I I II I I I I I I II : I I I : I I I I : I I I I I I ! I I I I I I I I I I I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGI YITGK 

270 280 290 300 310 320 

380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 
MM 

orf!15 EKGV 

In addition, it shows homology with a secreted ^meningitidis protein in the database: 

gi 12623258 (AF030941) putative secreted protein (Neisseria meningitidis) Length 
« 2273 

Score « 604 bits (1541), Expect = e-172 

Identities - 325/678 (47%), Positives - 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+. Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Qtery: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDI PGTV VPWAENGIHPTFT LPNSSLFAI 840 



45 



50 



55 



60 



65 



70 



Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVI^PQVTTOVTCNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSVVDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIWCEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDAIJOrTGRSGGGIKQKOTRHLKNQNGQAVSGTUXSKEIILVSGRDITVTG 1258 



BNSOCOD. <WO 992457BA2J_> 



PCT/IB98/01665 

WO 99/24578 

-304- 

— " 540 ira^^ 598 9 

Sbict- 1259 SNIIADNHTILSAKNNIVLKAAEW "18 
Query: 599 f^T-VCS^ ™ 



Query: 659 QTYEQKGLT VAFS S PVTD 676 
1Q Q YEQKG+TVA S PV + 

Sbjct: 1379 QVYEQKGVTVAI SVPWN 1396 

Based on this analysis, it is predicted that the proteins from ^meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 62 

15 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 517>: 

1 TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 ' TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

Hi ggSaaat TAGTCATTAC CGATAAAGCC CAAAGTCATC acgaaaccgc 

<>ft 201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

20 „i ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

Hi Sagcaggca atcatgttcg cattggtaca acccaaactc aaagccaaag 
111 cgaaacctat catcaaaccc agaaatcagg attgatgagt gcaggtatcg 

ll\ GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 
0< 451 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 

25 sol tcSSc AAACACTACG AACAAATCGG cagtaccgtt tccagcccgg 

551 aaggcaacaa taccatctat gcccaaagca tagacattca agcggcacac 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG.CTAAC 
651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

30 This corresponds to the amino acid sequence <SEQ ID 518; ORF1 17>: 

1 SGHNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 ' GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

ill QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

in Kgstvgs Stivag KHYEQIGSTV sspegnntiy AQSIDIQAAH 

35 201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Hnmnlnp y with the psoA putetivg secreted prot ein oi N. mening itidis (accession number AF030941) 

ORF1 17 and pspA protein show 45% aa identity in 224aa overlap: 

orfin- 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 
Ottll/. 4 """^,3 G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
40 pspA: 1173 MRIrSgIeQGrWgRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 

0r£U7: 64 HETAQSSTFEGKQVVWAGNDANII^^ 123 
+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
45 pspA: 1233 NGQAVSGTLMIffillLVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orfin: 124 QKSGLM-SAGIGFTIGSKTOTQENQSQSNEHTGSTVGS^GD^ 182 
u " +v<;rT>i <5 GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 

PSP A: 1293 SS^S!^^ 1352 



50 



Or£U7: 183 PEGNNTIYAQSIDIQAAHNKl^SNTTQTYEQKXLTVAFSSPVTD 226 

P+G+ I + I I AA N+ + + Q YEQK +TVA S PV + 
pspA- 1353 POGDVGISSGKISIDAAQNRYSQESKQVYEQKGVTVAISVPWN 1396 



H 

BMSOOOO <WO_9«*S78A2J_> 



WO 99/24578 PCT/IB98/01665 

-305- 

Homnlo^y with a predicted ORF frnm fj annnrrhoeae 

ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 

orf!17 oeo SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

5 " F F II I I! I! | II I i:l l:IMII I : I I I :J I 

orfU7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTIAVYAKNDITIS 480 

orfll7 pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 
:||:: : 1 1 1 M I I I I ! I 1 I 1 1 1 I M I I I I I I I I II I I I I I I I I i M I 1 1 1 1 1 I I 1 1 I H 
10 orfl!7ng SGI HAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQS ST FEGKQWLQAGNDANI LGS 540 

orf 117. pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

M I | | I I I : I II I I I I I I I I ! I I I I I I I M ! I I I I I I I I I I I I I I I 1 I 5 I I I I I I M I I I 
orfinng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 .pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

I I I I t I I I 1 I M I I I I I I I : I I i I I ICIIIlllll I I I : I : I I I : I I I I 

orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

20 orf 117. pep YEQKXLTVAFSSPVTDLAQQ 230 

I I I I I I I I I I I I I II I I I I 

orfll7ng YEQKGLTVAFSS PVTDLAQQAI AVAHKAAKQFDKAKTTALMPWRLPMOVGRLFKQAKAPK 720 

An ORF1 17ng nucleotide sequence <SEQ ID 519> was predicted to encode a protein having amino 
acid sequence <SEQ ID 520>: 

25 1 . . LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

30 251 QVYVRVKNGG IDGKGALLSG SNTOINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

4 51 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

35 501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

40 Further work revealed the following gonococcal partial DNA sequence <SEQ ID 521>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

45 201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

50 451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

55 701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

60 951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 



BNSOOaD: <WO_9924S7aA2J_> 



10 



15 



20 



WO 99/24578 



1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



PCT/IB98/01665 



TATCACAGGC 
TCAACATCAT 
CGGCTGCAGG 
ATATCAAGAA 
CGAACGAAGT 
TCAGGGAATA 
CACACTTGCC 
ATGCCGGCCA 
GGTAATAAAT 
TCAAAGCAGC 
ATGCCAACAT 
CAAGCAGGCA 
CGAAACCTAT 
GCTTCACTAT 
AACGAACATA 
TGTTGCAAGC 
AGGGCAACAA 
AACCAATTAA 
GGTGGCATTC 
TAGCACACAA 
ATGCCATGGC 
GGCGCACAAA 



AAAGAAAAAG 
TGCCGGTCAA 
CAGGACGCGA 
ATCCATTTTG 
CGGCAGCAGC 
ATCTCAATGC 
GTGTATGCTA 
AGTTGATGAT 
TAGTCATTAC 
ACCTTTGAAG 
CCTTGGCAGT 
ATCATGTTCG 
CATCAAACCC 
TGGCAGCAAG 
CAGGCAGTAC 
AAACACTACG 
CCTTATCAGC 
ACAGCAAAAC 
AGTTCGCCCG 
AGCAGCAAAC 
GGCTGCCAAT 
ACTTAG 



-306- 

GTGTTTTAGC 

ATCAGCAATC 

CATTAACCTG 

ATGCCGATAA 

ATTCAAACAA 

CAAAGCTGCC 

AAAATGACAT 

GCGTCCAAAC 

CGATAAAGCC 

GCAAGCAAGT 

AATGTTATTT 

CATTGGTACA 

AAAAATCAGG 

ACAAACACAC 

CGTAGGCAGC 

AACAAACCGG 

ACGCAAAGTA 

CACCCAAACC 

TTACCGATTT 

AAGTCGGACA 

GCAGGTTGGC 



AGCGCAGGCA 
AATCAGATCA 
GATACGGTAC 
CCATACCATC 
AAGGCGATGT 
GAAGTCGGCA 
TACTATCAGC 
ATACAGGCAG 
CAAAGTCATC 
TGTATTGCAG 
CCGATAATGG 
ACCCAAACTC 
ATTGATGAGT 
AAGAAAACCA 
CTGAAAGGCG 
CAGCAACGTT 
TGGATATTGG 
TACGAACAAA 
GGCACAACAA 
AAGCAAAAAC 
AGGCCTATCA 



GGCAAAGACA 

AGGGCAAACC 

AAACCGGCAA 

CGAGGTTCAA 

TACCCtatTG 

GCGCAAAAGG 

TCAGGCATCC 

AAGCGGCGGC 

ACGAAACTGC 

GCAGGAAACG 

CACCCGGATT 

AAAGCCAAAG 

GCAGGTATCG 

ATCCCAAAGC 

ATACCACCAT 

TCCAGCCCTG 

CGCAGCACAA 

AAGGCTTAAC 

GCGATTGCCG 

GACCGCGTTA 

AACAGGCAAA 



This corresponds to the amino acid sequence <SEQ ID 522; ORF1 17ng-l>: 



25 



30 



35 



1 LLVQTEKDGL 

51 LPEEITRDIS 

101 SLPYTPNSFT 

151 LKLDPNNLHK 

201 NGATAARSMN 

251 QVYVRVKNGG 

301 DNIGGRIHAQ 

351 QNAQGSSTYL 

401 RLQAGRDINL 

451 SGNNLNAKAA 

501 GNKLVITDKA 

551 QAGNHVRIGT 

601 NEHTGSTVGS 

651 NQLNSKTTQT 

701 MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RPIKQAKAHK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
T* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAN 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
KSDKAKTTAL 



ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
40 shows homology with a secreted N. meningitidis protein in the database: 

gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis ) Length - 
2273 

Score « 604 bits (1541) , Expect - e-172 

Identities « 325/678 (47%), Positives - 449/678 (65%), Gaps = 22/678 (3%) 
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50 



55 



60 



65 



Query: 1 
Sbjct: 739 
Query: 61 



LLVQTEKDGl^EQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 
L+V T + L N++T GK+++GLHYR +KG D TG+ Y E++ I 
LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 



LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGI SLPYTPNSFT PLPGSSLYII 120 
+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGI H PT FT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYM1JUVLQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GY^LrcYTNDEEQFKAI^DNGITIAKELQLTPGI^ 960 

Query: 241 DGGTQTVI24PQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSG S WDIG - SGAI ENRGG LI AGREALI LN AQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLI^AGNNINNQSTAKSSQNAQGSSTY 359 



BNSOOCID-. <WO 992467SA2_I_> 



WO 99/24578 



PCT/IB98/01665 
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15 



20 



25 



-307- 



Sbjct : 


1020 


Onori; * 
VUciy • 


360 


Sbjct : 


1079 


Qus ry i 


420 


Sbjct : 


1139 


Query : 


480 


Sbjct : 


1199 


Query: 


540 


Sbjct: 


1259 


Query: 


599 


Sbjct: 


1319 


Query: 


659 


Sbjct: 


1379 



+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 



+ R+AGIY+TG++ G + AG +1 + a +++NQS+ GQT L AG DI DT + Q 



FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DL + 



+G 



+ +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 



SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 
++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

QTYEQKGLTVAFSS PVTD 676 
Q YEQKG+TVA S PV + 
QVYEQKGVTVAISVPWN 1396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 



30 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 523>: 



35 



40 



1 ATGATTTACA 

51 CAACATGTAT 

101 GACACTCCGA 

151 GACGGCAAAC 

201 GGTCAAAAAA 

251 AACAGGATGC 

301 TTCAAAACCG 

351 CTCCGCCCAC 

40 1 CTGCCGACGC 

451 CCGCTGATTA 

501 CGTGCGCATC 



TCGTACTGTT 
CAGGAAAACC 
CAAAGATGCC 
CGTCCGGCGG 
ACGGCAAAAC 
CGTCTACATC 
AAATCGAAAC 
ACCGTTTCCG 
GTCGGCAAAA 
CGCTCAAAGA 
GACTTCATCT 



TCTAGCTGTC 
AATACCGCAA 
CTGCTCAACA 
GTCAGTCATG 
CCCAAGACCC 
GCCAAGCAGA 
CGCCTTGGAA 
AACCCCAAAC 
CCTGCACCCG 
ACTGTCAAAA 
CCTAT . . . 



GTCCTCGCCG 
AAAAGTGCGC 
GCAwAACCAG 
ATGCCGAAAC 
CGyCATGCGC 
AACAGGCAAA 
GAAAGCGGCA 
CGGACATTCC 
TTCCGCAAAC 
GTCGAATTAT 



TTGTCGCCTA 
GACCAGTTCG 
CCATGTCCGC 
CCCAACCGGC 
AACCTGCAAG 
AGCCTCCCCG 
TTATCGGCAA 
GCAACGAAAC 
ACCTGCAAAA 
CCTGGTTTGA 



45 



This corresponds to the amino acid sequence <SEQ ED 524; ORF1 19>: 

1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 
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55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGATTTACA 
CAACATGTAT 
GACACTCCGA 
GACGGCAAAC 
GGTCAAAAAA 
AACAGGATGC 
TTCAAAACCG 
CTCCGCCCAC 
CTGCCGACGC 
CCGCTGATTA 
CGTGCGCTTC 
TGCACGCACT 
TGCACCATGG 



TCGTACTGTT 
CAGGAAAACC 
CAAAGATGCC 
CGTCCGGCGG 
ACGGCAAAAC 
CGTCTACATC 
AAATCGAAAC 
ACCGTTTCCG 
GCCGGCAAAA 
CGCTCAAAGA 
GACTTCATCT 
GCCGCGCCTT 
ACGACCATTT 



TCTAGCTGTC 
AATACCGCAA 
CTGCTCAACA 
GTCAGTCATG 
CCCAAGACCC 
GCCAAGCAGA 
CGCCTTGGAA 
AACCCCAAAC 
CCTGCACCCG 
ACTGTCAAAA 
CCTATATCGC 
TCCAACCGCT 
CCAGATTGCC 



GTCCTCGCCG 
AAAAGTGCGC 
GCAAAACCAG 
ATGCCGAAAC 
CGCCATGCGC 
AACAGGCAAA 
GAAAGCGGCA 
CGGACATTCC 
TTCCGCAAAC 
GTCGAATTAC 
GCTGACCGAA 
GCCGCTACCA 
GAACCCATCC 



TTGTCGCCTA 
GACCAGTTCG 
CCATGTCCGC 
CCCAACCGGC 
AACCTGCAAG 
AGCCTCCCCG 
TTATCGGCAA 
GCACCGAAAC 
ACCTGCAAAA 
CCTGGTTTGA 
GCCAAAGAAC 
GATTGTCGGC 
CGGGCATCCG 



BKSOCCIO. <WO 9924S7BA2_I_> 



W099/24578 PCT/IB9N01665 

^OS- 



es i CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

5 851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

JO 1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; ORF1 19-1>: 

15 i MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

20 251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 

25 Homology with a predicted ORF from N. meni ngitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 11 9 pep MI YIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
30 ' M I I I I I I I : I I J I I 11 III M I I I I I II I I I I I I I I I I I I HI I I I I I I I I I I II I I 

orf 119a MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

35 orf 119 .pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

MI1MIIIIIII III M 1 1 1 1 I k 1 I I I 1 1 1 I 1 1 II 1 1 1 1 1 1 1 I I I 1 1 I t I 1 I I M 1 1 
orf 119a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
70 80 90 100 110 . 120 

40 130 140 150 160 170 

orf 119. pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 
|| I It II I M I I I I I II I : I I I II I I I I I I I I I I I I I I U I I I I I: I I I I I 
orf 119a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
130 140 150 160 170 180 

45 

orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

The complete length ORF 1 19a nucleotide sequence <SEQ ID 527> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

50 51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

55 301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

60 551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 



BNSOOCJD: <WO 992457BA2J_> 



WO 99/24578 



-309- 



PCI7IB98/01665 



10 



651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



CTATCAGGCA 
CCTCGCAGGA 
CACAGCATGG 
AGTGGCTTCC 
CCATCCATTT 
GCCGTAACGG 
TACCGACACG 
AGCCGTTTAC 
ATGCTGCTCG 
TTTGTTTATG 
TGGTCAACGA 
CGCACTTATG 
ACCGGGCGGC 



TTTATCGTGG 
AGAACTCTCC 
GCGGTCAGAC 
GCACTGGACG 
GGTTTCCCCG 
GCGTGGGTTT 
TCGGGCTCGA 
CAATGCCCTT 
ACATCCCGCA 
GATTTGGCGG 
CAAAATGGAA 
TATTGGCTCG 
AAAACCGCAT 



GTATTCAGGC 
GCATTCAACC 
GCTGCACACC 
CATTCTGCGC 
ACCAGCATCA 
CGTTTTGGAA 
CCATGTTCTC 
TTGGACAACC 
CTCTCCGGCA 
TACGCCTGTC 
GAAGTTTCGA 
TCAGTCCGAG 
TGCGCCTGTT 



AGTCAGCCGC 
GCCAGGTGGA 
GACCTTGCCG 
GCGCGTCGAC 
GCGGCGTAGA 
GACGACGGCG 
CATCTGCTCG 
AGTCCTATAA 
GGCGAAAAAA 
CGGCCAGTTG 
CCCAATGGCT 
ATGCTCAAAG 
CTCCTAA 



AACGGACTTG 
TGCATTCGCA 
CCTTTATCGA 
CAGACTATCG 
ACTGCGTTCC 
CGTTCCACTA 
CTCAACAACG 
AGGCTTCAGT 
CCTTCGACGA 
AACCTGAATC 
CAAAGACGTG 
TCGGTATCGA 



This encodes a protein having amino acid sequence <SEQ ID 528>: 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 



MIYIVLFLAA VLAWAYNMY 



DGKPSGGPVM 
FKTEIETALE 
PLITLKELSK 
CTMDDHFQIA 
HSMGGQTLHT 
AVTGVGFVLE 
MLLDIPHSPA 
RTYVLARQSE 



MPKPQPAVKK 
ESGIIGNSAH 
VELPWFDVRF 
EPIPGIRYQA 
DLAAFIEVAS 
DDGAFHYTDT 
GEKTFDDLFM 
MLKVGIEPGG 



QENQYRKKVR 
TAKSQDPAMR 
TVPEPQTGHS 
DFISYIALTE 
FIVGIQAVSR 
ALDAFCARVD 
SGSTMFSICS 
DLAVRLSGQL 
KTALRLFS* 



DQFGHSDKDA 
NLQEQDAVYI 
APKPADAPAK 
AKELHALPRL 
NGLASQEELS 
QTIAIHLVSP 
LNNEPFTNAL 
NLNLVNDKME 



LLNSKTSHVR 
AKQKOAKASP 
PVPVPQTPAK 
SNRCRYQIVG 
AFNRQVDAFA 
TSISGVELRS 
LDNQSYKGFS 
EVSTQWLKDV 



ORF1 19a and ORF1 19-1 show 98.6% identity in 428 aa overlap: 
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50 
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orfll9a.pep 
orfll9-l 

orf!19a.pep 
orfll9-l 

orfll9a.pep 
orfll9-l 

orfll9a.pep 
orfll9-l 

orf 119a. pep 
orfll9-l 

orf 119a. pep 
orfll9-l 

orf 119a .pep 
orfll9-l 



10 20 30 40 50 60 

MIYIVLFLAAVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
MMIIIIMIIIIH II! II UN IMMIIIINIII M HMI1I U IIIIIM II 
MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

MPKPOPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
MlimilMli I I I ) I I I I I I I I I Ml 1 I I I ! M I I III M I I M M II I M I I I I I 
MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
II | t | t I I I i 1 I M M I I I I : M I I t 1 1 1 I I 1 I U I I 1 i M M M II I M I I I I 1 I I I t 
TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

AKELHALPRL 5NRCRYQIVGCTMDDHFQ I AEP IPG I RYQAFIVGIQAVSRNGLASQEELS 
MIMMIIItlMIIIIIIMIIilllllillMMIIIitlHIlllltlllllllM 
AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

I | | | | 1| | | I : I I I t I I I I I II II I \ I I I I I M M I I I I I I I I I I I I I I M I I I I I I I I I 
AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

AVTGVGEVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

1 1 1 1 1 1 1 1 1 1 1 1 $ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 i 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
310 320 330 340 350 360 

370 380 390 400 410 420 

GEKTFDDLFMDIAVRLSGQLNI^LVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

iiiiiiiiiiniiiiiii tiiittmiinimiiiiinmiimmi 

GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
370 380 390 400 410 420 
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BNSOOCitt <WO 992AS78A2_L> 



WO 99/24578 



-310- 



PCT/IB98/01665 



orfll9a.pep KTALRLFSX 
I 1 1 I I I 1 I 1 
orfll9-l KTALRLFSX 

5 Homology *i"th a predicted ORF fro m N ^nnnrrhoeae 

ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 
N.gonorrhoeae: 

orfll9 Dep MIYIVLnA\n^WAYKMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 60 
M III I Ml: IN II Mill II MUM IN I M I I I || | I I ] I IMIMMIIM M 
10 orf 1 1 9ng MIYIVLFLAAVIAWAYNMYQENQYRKKVRDQFGHSDKDAIJ^SKTSHVRDGKPSGGPVM 60 

orfll9 pep MPKPQPAVKKTAK PQDPXMRHLQEQDAVYI AKQKQAKAS PFKTE IETALEESG I IGN S AH 120 

MM MM II MM I MMMMMIM I I I I I I I I I M I I I I IIMMM 

orf!19ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 

iw * orf 11 9 pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 1*75 

I M M M M M M M I I M : M M M M M M M I M M I I M M I : M M I 
orf 11 9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 

The complete length ORF119ng nucleotide sequence <SEQ ID 529> is: 

20 1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

25 251 AACAGGATGC CGTCTACATC GCCAAGCAGA AAC AG GC AAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

30 501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT tCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

35 751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

40 1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

45 1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

This encodes a protein having amino acid sequence <SEQ ID 530>: 

1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

50 151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDKQSYKGFS 

351 MLLOIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

55 401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19ng and ORF1 19-1 show 98.4% identity over 428 aa overlap: 

10 20 30 40 50 60 

orfll9nq MIYIVLFIJ^VIAWAYNMYQENQYRKKVRDQFGHSDKDAIJjNSKTSHVR 

IIIIIIIICIIIIIIIIIIIIIIIIIIMIIIIIMMIIIIIIIIIIIIIIIIII It 
60 orf H9-1 miyivlfiavviawaynmyoenqyrkkvrix^fghsdkdai^sktshvrexskpsggsvm 

10 20 30 40 50 60 



BNSOOCID: <WO 892A57BA2J_> 



WO 99/24578 



-311- 



PCT/IB98/01665 
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20 



25 



30 



35 



40 



orfll9ng 
orfll9-l 

orfll9ng 
orfll9-l 

orf 119ng 
orf!19-l 

orfll9ng 
orfll9-l 

orf 119ng 
orfll9-l 

orfll9ng 
orfll9-l 



70 80 90 100 110 120 

MPKPQPAVWCPAKPQDSAM^ 

MMIIIIll inn MltlintlMlltillllllltillMHMI lllttlll 
70 80 90 100 110 



120 



170 



180 



130 140 ISO 160 

TVSEPQTGHSAPKPADAPAKFVPW 

I | t 1 I I i 1 1 I I 1 I I l | ! t 1 I I = I I M * J t I I I I 1 I I M I I I I I M M I M II I I I I I I I I 
TVSEPQTGHSAPKPADAPAKPAPV 

130 140 1^0 160 170 180 



170 



190 200 210 220 230 240 

AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASOEELS 

I || UMIII IIMIIIH IIIMIII II Mil HIM IH I I 1 1 1 1 1 1 1 1 1 I I I I I I I I 
AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

MlllilllMIIIIIIIIIIIIIIMIinilllMMIMIIIIIIIIIMimill 
AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTS I SG VELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

I I m 1 1 1 1 1 n 1 1 1 m 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 m h 1 1 1 1 m ii m m i mm 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
310 320 330 340 350 360 

370 380 390 400 410 420 

GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

HIM iiilMIIIIIIIIMIIIIIIIIIIIIIMIIIMIIIII IMIIMIIMIM 
GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 



429 

KTALRLFSX 

I I I M I II 1 
KTALRLFSX 



orfll9ng 
orf!19-l 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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The following partial DNA sequence was identified in N. meningitidis <SEQ ID 53 1> 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



. GCGCGGCACG 
GCAGATAGTC 
TCGCCCTGAT 
CTGGTGTCCG 
CGGCGCGCGG 
TAATCTGCGT 
AGCCTCGTGT 
CATGTCCGTC 
TCGGCTTTAT 
TTGGCACAGG 



GCACGGAAGA 
GAAAGCACCA 
TTCATTGGTA 
TTACCGAGCG 
CGCGGCAATA 
CATCGGCGGT 
TCAATCATTT 
ATCGGCGCGG 
GCCTGCCAAT 
ATTGA 



TTTCTTCATG 

CCGGTACGAT 
GTCGGCGGCA 
CACCAAAGAA 
TTTyGCAGCA 
TTGGTCGGCG 
TGTAACCGAC 
TCGCCTGTTC 
AAAGCAGCCA 



AACAACAGCG 
GAAGCTGCTG 
TCGGCGTGAT 
ATCGGCATAC 
GTTTTTGATT 
TGGGTTTGTC 
TTCCCGATGG 
GACCGGAATC 
AACTCAATCC 



ACAC.ATCAG 
ATTTCCTCCA 
GAACATCATG 
GGATGGCAAT 
GAGGCGGTGT 
CGCCGCCGTC 
ACATTTCCGC 
GGCATCGCGT 
GATAGACGCA 



60 



This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 

1 ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 ' LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

101 SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 

151 LAQD* 



BNSOOCIO <WO 992*57BA2_I_> 



WO 99/24578 



-312- 



PCT/IB98/01665 



Further work revealed the complete nucleotide sequence <SEQ ID 533>: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

5 151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

10 401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

15 651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

20 901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

25 1151 CATTGGCACA GGATTGA 

This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QWVIDQNVK 

30 151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 H0ITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDA1AQD* 

35 Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o648 oiE.coli (accession number AE000189) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

Orfl34: 2 RHGTE DFFMNNS DX I RQI VE STTGTMKXXXXXXXXXXXWGG IGVMN I MLVS VT ERTKE I 61 
RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EI 
40 o648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTl^LFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

0rfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

o648: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 



45 



0rfl34: 122 GAVACSTGIGIAFGFMPANKAAKLNPIDALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
o648: 616 LA FLC S TVT GIL FGW L P ARN AARLD P V D ALARE 648 



50 Homology with a predicted ORF from N.meninzitidis (strain A> 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A of N. 
meningitidis: 

10 20 30 

or f 134 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 
55 I I I 1 1 I I I 1 t I i t I I 1 I 1 I f I 1 t I t I I 1 1 

or f 1 3 4 a GESHTNS ITVKIKDN ANTQVAEKGLTDLLKARHGTEDFFMNN SDS I RQI VE STTGTMKLL 

210 220 230 240 250 260 

40 50 60 70 80 90 



BNSDOC1D: <WO_992457BA2_L> 



10 



WO 99/24578 



PCT/IB98/01665 



-313- 



orfl34.pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGN IX QQFLIEAVLICVIGG 

I I I I I I I M i I I 1 I I I I I M I I I I 1 I I I M I 1 1 1 Ullll iiiillllltllll 

orfl34a ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 280 290 300 310 320 

100 no 120 130 140 150 

orfl34.pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I II I I I I I I I I I I I I I I | | | | | | | I | I II I M I i I I I I I II I I I I I I I I I I I I I » M I I I 
orfl34a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 
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orfl34 .pep 
orfl34a 



LAQDX 
I I I I I 
LAQDX 



The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 
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25 



30 



35 



40 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTTGAAG 
AGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATCA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 536>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MK LLISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ QFLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

50 ORF 1 34a and ORF1 34- 1 show 1 00.0% identity in 388 aa overlap: 
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55 



60 



65 



orf 134a. pep MSVQA\nAHKMRSU-TMLGIIIGIASVVSVVALGNGSQKKILEDISSIGTNTISIFPGRG 

I I II I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 134-1 MSVQAVLAHKMRSLLTMLG 1 I IGI AS WS WALGNGSQKKI LEDI S S IGTNTI S I FPGRG 

orf 134a . pep FGDRRSGRIKTLTIDDAKI IAKQSYVASATPMT S SGGTLTYRNTDLTASLYGVGEQYFDV 

II I 1 I 1 1 I I I I I I II I I I I 1 1 I I II I I I I I I II II I I I ! I I I I I I I I I I I I I I I I I I I I I 
orf 134-1 FGDRRSGRI KT LT I D DAK 1 1 AKQ S YVAS AT PMT S SGGT LT YRNT DLT AS L Y G VG EQY F D V 

orf 134a .pep RGIJCLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPI^KTILFRKRPLTVIGVMKKD 
I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I M I 
orfl34-l RGLKLETGRL FDENDVKEDAQVW I DQNVKDKL FADS DPLGKT I LFRKRP LTVIGVMKKD 

orf 134a. pep ENAFGN5DVLMLWS PYTTVMHQITGESHTNS I TVKIKDNANTQVAEKGLTDLLKARHGTE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl34-l ENAFGNSDVU1LWSPYTTVMHQITGE SHTNS I TVKIKDNANTQVAEKGLTDLLKARHGTE 



BNSOOC1D: <W0_9924576A2J_> 



WO 99/24578 



-314- 



PCT/IB98/0I665 
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15 



orf 134a. pep 
orfl34-l 
orf 134a. pep 
orfl34-l 
orf 134a. pep 
orfl34-l 



DFFT^SDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

I I I I 1 I I | | | I 1 1 1 1 I f I I I I I 1 1 1 1 I 1 1 I 1 I 1 I 1 1 HI.III.UIIIIIIMM 

DFFMNKSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFWHFVTDFPMDISAMSVIGAVAC 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 jLJt J, JL JL 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

STGIGIAFGFMPANKAAKLNPIDALAQDX 
! II I I I II liltllMMMMIMIM I 
STGIGIAFGFMPANKAAKLNPIDALAQDX 



Homoloev with a predicted ORF from A Gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 



20 



25 



30 



gonorrhoeae: 

orf 134 .pep 
orfl34ng 
orf 134 .pep 
orf 134ng 
orf 134 .pep 
orfl34ng 
orf 134. pep 
orfl34ng 



ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 
I I I I I 1 II I 1 I I I I I I I : 1 1 I I I I 1 M I I 

GESHTNSITWIKDNANTRVAEKGLAELUCARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 

ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 

I 1 I 1 1 } I I I 1 I t I I I I t I 1 I I t I 1 I 1 1 I I I 1 t I I I i I I I I I 1 1 1 11111111111:111 

ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I I I I I 1 t I I I I 1 I 1 1 M I I I I 1 1 1 1 t I illliililMIIMlllMMIIililllli 

LVGVGLSAAVSLVFNHFVTDFPMDI SAASVIGAVACSTGIGIAPGFMPANKAAKLNPI DA 384 



LAQD 
Mil 
LAQD 



154 



386 



The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 



35 



40 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAAAATCAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCACGGCA 
AGGCAGATGG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCGGCATCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 

ATCATCGGTA 

GCAGAAAAAA 

GCATCTTCCC 

ACCCTGACCA 

CTCCGCCACG 

CCGACCTGAC 

CGCGGGCTGA 

AGAAGACGCG 

TTGCGGACTC 

TTGACCGTCA 

CGACGTGCTG 

CAGGCGAGAG 

AATACCCGGG 

CGGCACGGAA 

TCGAAAGCAC 

ATTTCATTGG 

CGTTACCGAG 

GGCGCGGCAA 

ATCATCGGAG 

GTTCAATCAT 

TTATCGGGGC 

ATGCCTGCCA 

GGATTGA 



GGCGCACAAA 

TCGCTTCGGT 

ATCCTCGAAG 

CGGGCGCGGC 

TAGACGACGC 

CCCATGACTT 

CGCTTCTTTG 

AGCTGGAAAC 

CAAGTCGTCG 

GGATCCGTTG 

TCGGCGTGAT 

ATGCTTTGGT 

CCACACCAAC 

TTGCCGAAAA 

GACTTCTTTA 

CACCGGTACG 

TAGTCGGCGG 

CGCACCAAAG 

TATTTTGCAG 

GCTTGGTCGG 

TTTGTAACCG 

GGTCGCCTGT 

ATAAGGCAGC 



ATGCGTTCGC 

TGTCTCCGTC 

ACATCAGTTC 

TTCGGCGACA 

AAAAATCATC 

CGAGCGGCGG 

TACGGTGTGG 

GGGGCGGCTG 

TCATCGACCA 

GGTAAAACCA 

GAAAAAAGAC 

CGCCCTATAC 

TCCATCACCG 

AGGGCTGGCC 

TGAACAACAG 

ATGAAGCTGC 

CATCGGTGTG 

AAATCGGCAT 

CAGTTTTTGA 

CGTAGGTTTG 

ATTTCCCGAT 

TCGACCGGAA 

CAAACTCAAT 



TTCTGACCAT 

GTCGCGCTGG 

GATGGGGACG 

GGCGCAGCGG 

GCCAAACAAA 

CACGCTGACC 

GCGAACAATA 

TTTGATGAGA 

AAATGTCAAA 

TTTTGTTCAG 

GAAAACGCTT 

GACGGTGATG 

TCAAAATCAA 

GAGCTGCTCA 

CGACAGCATC 

TGATTTCCTC 

ATGAACATTA 

ACGGATGGCA 

TTGAGGCGGT 

TCCGCCGCCG 

GGACATTTCG 

TCGGCATCGC 

CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 538>: 



60 



i 

51 
101 
151 
201 
251 
301 



MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSMGT 
NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PtfTSSGGTLT 
YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 
DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 
HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 
RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 
IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 



BMSOOdO. <WO__9»457BA2J_> 



WO 99/24578 



PCT/IB98/01665 



-315- 



351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORFl34ng and ORF134-1 show 97.9% identity in 388 aa overlap: 



10 



15 



20 



25 



orf 134ng 

orfl34-l 

orf 134ng 

orfl34-l 

orf 134ng 

orfl34-l 

orfl34ng 

orfl34-l 

orf!34ng 

orfl34-l 

orfl34ng 

orfl34-l 

orf 134ng 

orfl34-l 



MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSMGTNTISIFPGRG 

II 1 1 1 1 1 1 1 1 1 t I I I I i 1 1 1 1 1 1 1 1 1 1 I I f M 1 1 I I » i I 1 1 I r I 1 1 I 1 1 I 1 1 1 1 1 

MSVQAVIAHKMRSIXTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

FGDRRSGKIKTLTI DDAKI IAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEOY FDV 

! I I I I 1 f t I 1 I I 1 1 1 I I I I I I I 1 I 1 1 t I 1 1 t I I I I i t I 1 1 I I I 1 1 I I 1 I t I I ! i 1 1 f I K 1 
FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 
I I M I t I i i I I I I I I I M 1 I I I t I I 1 I I I I I I I I t I i I I I i i I t I I I I I II I I I M i M t 
RGLKLETGRLFDENDVKEDAQVWI DQNVKDKLFADS DPLGKT I LFRKRPLT V I GVMKKD 

ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 
t Mi HI M || I Ml MM Ml II I III llilll MINI 11:1 Mill ::i I Nil Ml 
ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

DFFMNNSDSIR^VESTTGTMKLLISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMA 
M II I II II II M I M M I M M I II M II I M II M M M M II I M 11 I It M III M 
DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 
M M I M M M II II I M II : I II M M M I I M I M I II I M II I M I 11 II I M M I 
IGARRGNILQOFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

STGI GI AFGEWPANKAAKLN PI DALAQDX 
MM IMMIIMI M II Ml I IIIIMI 
STGIGIAFGFMPANKAAKLNPIDALAQDX 



30 ORF134ng also shows homology to an Exoli ABC transporter: 

sp|P75831|YBJZ_EC0LI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >gi5 
(AE000189) o648; similar to YBBA__HAEIN SW: P45247 [Escherichia coli] Length = 
648 

Score = 297 bits (753), Expect = 6e-80 

Identities = 162/389 (41%), Positives ° 230/389 (58%), Gaps = 1/389 (0%) 

MSVQAVLAHKMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKILEDISSMGTNTISIFPGRG 60 
M + +A+ a+KMR+LLTKL +G+ +++ +L DI S+GTNTI ++PG+ 

MAWRAIAANKMRTLLTMLGIIIGIASWSIVVVGDAAKQMVLADIRSIGTNTIDVYPGKD 319 

FX3DRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 120 
FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 



35 



40 



45 



50 



55 



60 



Query: 


1 


Sbjct : 


260 


Query: 


61 


Sbjct: 


320 


Query: 


121 


Sbjct: 


380 


Query: 


180 


Sbjct: 


440 


Query: 


240 


Sbjct: 


500 


Query: 


300 


Sbjct: 


560 


Query: 


360 


Sbjct: 


620 



G+ G F++ + AQWV+D N + +LF +D +G+ IL P VIGV ++ 



++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ AE+ L LL RHG 



+DFF N D + + VE TT T++ 



A+GAR ++LQQFLIE 



CST GI FG++PA AA+L+P+DALA++ 



WGGIGVMNIMLVSVTERT+EIGIRM 



F+ + + S +++ A 



BNSOCCID <WO 9924578A2J_» 



PCT7IB98/01665 

WO 99/24578 

-316- 

Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from ^meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

5 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 539>: 

1 . GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T . CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

10 201 CAGCGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 

251 CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTT7TA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

351 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

401 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 

15 451 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 

501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 

551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT GCATCATCAT CCTCAGCGGT 

601 ATTTTGA 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 

20 1 ..GrGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 

51 LLLGFAGWL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

25 Further work revealed the complete nucleotide sequence <SEQ ID 541 >: 



30 



35 



1 


ATGGATACCG 


51 


GGCGGCCTGC 


101 


AATTTGCCCT 


151 


ACCGTTGCGC 


201 


GCCCCATTGG 


251 


TGCTGCTGCT 


301 


ACCCTGAGTT 


351 


TTTGAAAGAA 


401 


TTGCCGGCGT 


451 


ACGGCGGCAC 


501 


TTTGAAAGTG 


551 


TGTTTTACCT 


601 


CTGACCGGCT 


651 


CATCGGCGTG 


701 


AAGTCGGCGA 


751 


TTTTCCGCTC 


801 


GGAAATACTC 


851 


TCCGCCCCAC 


901 


TAA 



TTTGGCGCAT GCTGTTTTCA 



;acaccct gtcctttcca tcggcagttt ATCTGTCGTG 

_ _ ^GCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 
40 701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 



45 This corresponds to the amino acid sequence <SEQ ID 542; ORF1 35-l>: 

1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

50 201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

Computer analysis of this amino acid sequence gave the following results: 



SNSDOdO <WO 992457aA2J_> 



WO 9W24578 



-317- 



PCT/IB98/01665 



Homology withjjyedicted ORF from N me ningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from strain A of M 
meningitidis: 

10 20 30 

-,,«. GTGAMLLLFYAVT I LPLATGVTLS YTSj> I F 

orfl35.pep Mill Mill Ml I M I I I 1 1 I I I I I M I 

STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 
50 60 70 80 90 100 

in 40 50 60 70 80 90 

nrfl35 oeo LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
orfl3b. P ep I! Ill Him I II Mill Mill III IMMI M I I 1 1 1 1 H 



orfl35a 



15 



orfl35a LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
HO 120 130 140 150 160 

100 110 120 130 140 150 

orfl35 oeo VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
PP (MMiMllinimiMMMMMMIIIIIimiMMIIMMMIMlim 
nrfl35a V RELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
20 170 180 190 200 210 220 

160 170 180 190 200 

orfl35 Dep TRAYKVGDKFTVA5LSYMTVVFSALSAAFFLGEELFWQEILGMCIIISAVFX 

* F Ml MM Mill MM I I I I 1 I 1 M f : 1 1 1 1 I M I I M I I M 

OS nrfl35a TRAYKVGDKFTVASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 

~ J 230 240 250 260 270 280 

or f 1 3 5a KQRLQSLFRQRX 
290 300 

30 The complete length ORF1 35a nucleotide sequence <SEQ ID 543> is: 
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40 



45 



1 


AT GG AT AC CG 


51 


GGCGGCCTGC 


101 


AATTTGCCCT 


151 


ACCGTTGCGC 


201 


GCCCCATTGG 


251 


TGCTGCTGCT 


301 


ACCCTGAGTT 


351 


TTTGAAAGAA 


401 


TTGCCGGCGT 


451 


ACGGCGGCAC 


501 


TTTGAAAGTG 


551 


TGTTTTACCT 


601 


CTGACCGGCT 


651 


CATCGGCGTG 


701 


AAGTCGGCGA 


751 


TTTTCCGCTC 


801 


GGAAATACTC 


851 


TCCGCCCCAC 


901 


TAA 



50 This encodes a protein having amino acid sequence <SEQ ID 544>: 

1 MDTA KKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 TLSYTSSI FL AVFSFLIL KE RTSVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

55 201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VASLSYMTW 

251 FSALSAAFFL AEELFWQEIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

ORF135a and ORP135-1 show 99.3% identity in 300 aa overlap: 

orfl35a D€P MDTAKKDILG SGWMLVAAAC FT IMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
60 I I M M I I I M 11 1 I II II M I I II I I HI M II II I I II M M I I I I I II III II II I I 

orf 135-1 MDTAKKDIIX3SGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 



BNSOOC1D: <WO 99245 7BA2_I_> 



WO 99/24578 



PCT/IB98/01665 



-318- 



10 



15 



orf 135a . pep RRDTFRTPHWKNHI^RSMVGTGAMLLLFYA 

orf 135-1 RRDXEllTPHWKNHLNRSKVGTGAMLLLFYAVTHLPLATGVTLSYTSSirLAVFSFLILKE 

orf 135a . pep RI SVYTQAVLLLGFAGWLLLNPS FRSGQETAALAGIAGGAMSGWAYLKVRELS LAGE PG 
° r W | , , | | Ml | I II I I I I II 1 1 1 1 I I M I I I II I I I I I I I M I I I I I | I I I I I I I II Mill 

or f 1 3 5- 1 RI S VYTQAVLLLG FAGWLLLN PS FRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135a oep otvVFYXSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSAXIAQLSMTRAYKVGDKFT 
IIIIIMIIIIIIIIIIIIIIIIIIIIIMIMIIIIIMIIIMUIIMIIIIIIIM 
or f 1 3 5 - 1 WRWFYLSVTGVAMS SVWATLTGWHTLS FPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135a pep VASLSYMTVVFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 
| t | 1 I ! 1 I I I I i I I I 1 1 1 1 1 x 1 1 I I I 1 I 1 1 I i 1 I I I 1 I t I I 1 I t 1 1 1 I I I 1 1 I I I 1 I 1 I I 
orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Homoloev with a predicted ORF from A [gonorrhoeae 

ORF135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 



20 N. gonorrhoeae: 
orf 135. pep 



25 



30 



35 



GTGAMLLLFYAVTXLPLATGVTLSYTSSIF 30 
I I I I I I 1 I I I I I I I I 1 : I I I I I t I t ! I I I 

orfl35ng STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 335 

orfl35 pep lAVFSFLII^RISWTQAVLLI^FAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 90 

* 1 1 1 | I I l f 1 1 1 M I I I I 1 I I 1 1 M I I I I M I M M I I I II I I 1 1 I I I I I 1 1 1 1 1 1 1 1 I I _ 

LAVFS FLI LKERI SVYT QAVLLLGFAG WLLLN PS FRSGQE PAALAGLAGGAMSGW AYLK 395 

150 



orf 135ng 
orf 135. pep 
orf!35ng 
orfl35.pep 
orfl35ng 



VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 

! I I I II I I I I I II I I II 11:1111 I M I I I I I I Mllilll IMIMMIMI 

VRELSLAGEPGWRWFYLSATGVAMS SVWATLTGWHTLS FPSAVYLSGIGVSALIAQLSM 

TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVF 201 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 ^ I 

TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAAF 506 



455 



An ORF1 35ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having amino 
acid sequence <SEQ ID 546>: 



40 



45 



1 MPSEKAFRRH 

51 ILDIQLGLFR 

101 NLGHFTDTHL 

151 FRQCGHINRL 

201 QKQAKTHSTS 

251 NVLI KEA5AK 

301 NRSMVGTGAM 



LRTASFQGLH 
IDFAALAVYR 
IAOARRFIAD 
APGKDCRNGK 
LAARFTIRPS 
FALGSGELVF 
LLLFYAVTHL 



351 YTO AVLLLGF AGWLLLNPS 



401 LAGEPGWRW 
451 AQLSMTRAYK 
501 IISAAF* 



FYLSATGVAM 
VGDKFTVASL 



LHHFHQKVGK 
RTQVDFIHTV 
FGNIRPMRRG 
RDKVFFHTRH 
LSQRPFMDTA 
WRMLFSTVTL 
PLTTGVT LSY 
_FRSGQEPAAL 
SSVWATLTGW 
SYMTVVFSAX 



CGIIGFGIHI FPTLLPA AQG 
IDGIASDQAF SEWQILRRL 
EAKTFCRCFR FDGIDGIHGD 
YNQVCLEKTN CSARKIKFRH 
KKDILGS GWM LVAAACFTVM 
GAAAVLRRDT FRTPHWKNHL 
TSSIFLAVFS FLI LKERI SV 



AGLAGGAMSG WAYLKVRELS 
HTLS FPSAVY LSGIGVSALI 
SAAFFLGEEL FWQEILGMCI 



Further woric revealed the following gonococcal sequence <SEQ ID 547>: 
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55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGGATACCG 
GGCGGCCTGC 
AATTTGCCCT 
ACCGTTACGC 
GCCCCATTGG 
TGCTGCTGCT 
ACCCTGAGTT 
TTTGAAAGAA 
TTGCCGGCGT 
CCGGCGGCAC 
TTTGAAAGTG 
TGTTTTACCT 
Ctgaccggct 



CAAAAAAAGA 
TTCACCGTTA 
CGGCAGCGGC 
TCGGTGCTGC 
AAAAACCACT 
GTTTTACGCG 
ACACCTCGTC 
CGGATTTCCG 
GGTATTGCTG 
TCGCCGGGCT 
CGCGAACTGT 
TTCCGCAACC 
ggCACAcccT 



CATTTTAGGA 
TGAACGTATT 
GAATTGGTCT 
CGCCGTATTG 
TAAACCGCAG 
GTAACGCATC 
GATTTTTttg 
TTTACACGCA 
CTTAATCCCT 
GGCGGGCGGC 
CTTTGGCGGG 
GGCGTGGCGA 
GTCCTTTcca 



TCGGGCTGGA 
GATTAAAGAG 
TTTGGCGCAT 
CGGCGCGACA 
TATGGTCGGG 
TGCCTTTGAC 
GCGGTATTTT 
GGCGGTGCTG 
CGTTCCGCAG 
GCGATGTCCG 
CGAACCCGGC 
TGTCGTCggt 
tcggcagttt 



TGCTGGTGGC 
GCATCGGCAA 
GCTGTTTTCA 
CCTTCCGCAC 
ACGGGGGCGA 
AACCGGCGTT 
CCTTCCTGAT 
CTCCTTGGTT 
CGGTCAGGAA 
GCTGGGCGTA 
TGGCGCGTCG 
ttgggcgacg 
ATCtgtCGGG 



BNSOOQD: <WO 8924578A2LL> 



WO 99/24578 



PCT/IB98/01665 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

•JOl aaGTCGGCGA CAAATTCACG GTTGCCTCGC ttCCCtaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 G3AAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

5 851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 

1 MDTAKKDILG SGWMLVAAAC FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KKHT.NRgMvft Tr.AMLLLFYA VTHLPLTTGV 

1Q 101 TLSYTSSIFL AVFSFLILKE RTSVYTOAVT. ILGFAGWLL LNPSFRSGQE 

T51 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 TTr. M HT.T.g FP ffaWLSGIGV SALIA QLSMT RAYKVGDKFT VASLSYMTVV 

251 FSALSAAFFL GEELFWQEI L GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

15 ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 



orfl35ng-l.pep 



MDTAKKDILGSGWMLVAAACFrVMKVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 

I I I I I I I I I I I I I II I I I I I I I : I I I I I I M I t I I I I > I I I I I > I I I I I I > I : I I I I I I t 
'AKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 



orfl35-l MOT, 
20 or'135ng-l .pep RRDTFRTPHWKNHI^RSWGTGAMLLLFYAVTHLPLTTGV^ 

orf 135- 1 RRDX FRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTS S I FLAVFS FLI LKE 

orf 135ng-l . pep risvytqavlli^fagwlllnpsfrsgqepaaiaglaggamsgwaylkvrelslagepg 
orfl35-l risvytqavlllgfagvvlllnpsfrsgqetaaj^glaggamsgwaylkvrelslagepg 

orfl35ng-l.pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIA^ 

| | | M I I I : I I II I I I 1 1 I 1 1 I M I I I I I I I I I I I '• I I I M I I I I I I I I I I Mil I I M 
30 orf 135-1 WRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orfl35ng-l.pep VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQAL^ 

9 I I I 1 1 1 I M I 1 1 1 1 1 M 1 1 1 1 M M I II I ! I II I M I i I 1 1 I MUM; "Ml 

orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

35 Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 66 

The following DNA sequence was identified in N.meningitidis <SEQ ID 549>: 
40 ~~~ 



45 



50 



55 This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 

1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 
51 LPGIAEIDSP CGIVFGALLF RKLPAHCLYG KAAVGDAVAH EHPVADWNR 



1 


ATGAAGCGGC 


51 


TTTGGGACAA 


101 


TGCTCTTCCA 


151 


CTGCCCGGGA 


201 


GCTCCTCTTC 


251 


TAGGGGATGC 


301 


AACGCAAACG 


351 


TGTTCAGCAC 


401 


CACATATGTT 


451 


TTTGACCATG 


501 


AAAGcTCGCG 


551 


CGGTTTACCG 


601 


CATCATATCT 


651 


GCTTTCTgcC 


701 


GAATAG 



BNSOOCtO: *WO 992457BA2_I_> 



WO 99/24578 



-320- 
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101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKI FECFTGA FVGTVYRFVC LFYIINDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE» 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 

5 1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

10 251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

15 501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

20 This corresponds to the amino acid sequence <SEQ ED 552; ORF136-l>: 

1 MMKRR IAVFV LFPQI IRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYIIN DGI 

25 201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF136 shows 71 .7% identity over a 237aa overlap with an ORF (ORF136a) from strain A of K 
meningitidis: 

30 10 20 30 40 50 59 

orf 136 . pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
I I t I 1 I I I I I : I I I : I I I I I It II I I I I I I I I I I I I I II I I I I M II I t I I I I ! I I 
orf 136a MMKRRIAVEVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQOYLPGIAEIDS 
10 20 30 40 50 60 

35 

60 70 80 90 100 110 119 

orf 136 . pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
|||||||:IMII : I II I I I I I I I : I I I I I I 1 1 I I I I II I I I I I I I I I I I I I I MM 
orf 136a PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWKRNANAFALFDIGOFAGFIVQ 
40 70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 136. pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
I :: I : I I I I I I I I I I I I I M I I M II I I I : : I : I : I : : : : 

45 orf 136a HAINVKTVKINIVDPHMFANFAXFAVIXKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

130 140 150 160 170 180 

180 190 200 210 220 230 

or f 1 3 6 . pep AFVGTVYRFVCLFYI INDGI AHH— SAPQRVRYLFAP YCGFLPS ASDS DLKSSXXSEX 
50 * : 11:1 : : : : I M I 1 1 I I I I I If I I I I I I I I II I 11 II Ml 

orf 136a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF136a nucleotide sequence <SEQ ID 553> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

55 51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 
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301 CGGAACGCAA 

351 CATTGTTCAG 

401 ATCCACATAT 

4 51 GCTTTGACCA 

501 AAAAAGCTCG 

551 CACGGTTTAC 

601 CCCATCATAT 

651 CGGCTTTCTG 

701 CGGAATAG 



ACGCTTTCGC 
CACGCCATAA 
GTTCGCAAAT 
TGGCAAAATC 
CGCCAAAAAT 
CGGTTTGTCT 
CTGCTCCTCA 
CCTTCGGCAT 
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CTTGTTCGAC 
ATGTAAAGAC 
TTCGCCNTCT 
TAAGGNGNNA 
ATTTGAATGT 
GCCTGTTCTA 
ACGTGTACGG 
CCGATTCGGA 



ATTGGTCAGT 
CGTCAAAATA 
TCGCCGTCTT 
NNGATGCGGC 
TTTGCGGGCG 
CATAATAAAT 
TATCTGTTTG 
TTTGAAAAGT 



TCGCCGGGTT 
AATATCGTCG 
GGAAAAAAGG 
GGCGTTCCCA 
CGTTCGCCGG 
GACGGAATCG 
CACCTTACTG 
TCCAAATATT 



1 0 This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKH 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 



15 



ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 



20 



25 



30 



35 



40 



10 20 30 40 50 60 

crfl36a.peo MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIKQQYLPGIAEIDS 
| | | | | | U H I : I I 1 : 1 1 I I I I M I I 1 I I I II M t I I I I I ! I I I I I I I I I 1 I I ! I I I 
or f 1 3 6 - 1 MMKRRI AVFVLFPQI IRVLGQLLPKI VNTVPAHRMLFQI FGMFFFFIHQQYLPG IAEI DS 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 1 3 6a pep PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

MIIIII:!MM i I I I I 1 1 I I I I - I I I 1 I IIIIIIMIMIIMIMlim 

orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

70 80 90 100 110 120 

130 HO 150 160 170 180 

orf 136a . pep HAINVK7VKIN I VDPHMFAN FAXFAVLEKRALTMAKSKXXXMRRRS QKS S RQKYLNVLRA 

I :: I : I I 11 I I I I 1 I I i I I I I I I I M I I I • " I : ' : • : : : : 

or^l36-l HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

130 140 150 160 170 180 

190 200 210 220 230 

orf 136a pep R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

' H : II: I : ::: I I I I I II I I I I I I I I It M I I I I I I m I I I I I 

orf 136-1 AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
K gonorrhoeae: 



45 



50 



55 



orf 136. pep 
orf 136ng 
orf 136. pep 
orf 136ng 
orfl36.pep 
orf 136ng 
orfl36.pep 
orf!36r.g 



MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
IIMIMIU: | ||:l||||llllimimillll!ilimi:ili-llimi! 
MMKRRIAVFVLLMQK I RILGQLLPKI VNTVPAHRMLFQI FGMFFFFIHRQYLPG I AEI DS 

PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 

| |||||:|||MI 1 I I I I ! I I I I I I I I I I I i I I I I I = I M I I I I I I I I I I I I Mil 
PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 

HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIOGG^AMFPKKLAPKIFECFTG 
I I 1 1 t ! 1 I 1 I 1 I I I 1 I I f I I 1 1 1 I I I M I 1 I I I I I I I 1 1 1 I i I ! 1 1 I I I I M I ' M t M I 
HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIOGGNNAAAFPKKLAPKVFECFTG 



59 



60 



119 



120 



179 



180 



AFVGTVYRFVCLFYI IN DG I AHHS APQRVRYLFAP YCG FL PS AS DS D LKS SXXS E 
| | r | | I I i I I 1 I I I I t 1 I I I I I I = 1 t I I I 1 1 I I I I I Mil I I I M II I I II 
AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKS SKYSE 



234 



235 



60 



The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 
51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 
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101 GGATGCTCTT CCAAATTTTC GGG AT GTTCT TTTTCTTCAT ACACCGGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

5 301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

10 551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

15 i MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

20 ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
| | | | | | | | I I I : I M : II I 11 I I II N M II I II I I II 1 I I I II I! : I II II M II II 
or f 1 3 6- 1 MMKRRI AVFVLFPQI IRVLGQLLPKI VNTVPAHRMLFQI FGMFFFFIHQQYLPG IAEIDS 

25 orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAF7VLFDIGQSAGFIVQ 

I |||||:||llll HUM I I II I MINIMI Ml: I 1111111111111 II I 111 
orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWKRNANAFALFDIGQFAGFIVQ 

orfl36nq HTWIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 
30 II II II I I N 1 I I N II I I II II N I II N N I N 1 II I N I N II I I I I I I I M I II I I 

orf 136-1 HTWIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orfl36nq AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 
M : M I N II II N I I II I I I I I: M I N N II II I Nil II I I I I II N II N 
35 orf 136-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGT<?GG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

45 201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. . 

50 This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 559>: 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

5 201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGA7 TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

10 4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

15 701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

7 51 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

20 This corresponds to the amino acid sequence <SEQ ID 560; ORF 1 37- 1 >: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

25 201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidi s (strain A) 
30 ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A of N. 



meningitidis: 

10 20 30 40 50 60 

orf 137 . pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVOTAKPAAWGLALGGGASKGFAH 
| | | | I | I | | ! ! II | M I I I 1 I I I I I I I I I I : I I I I I I I I I I I M I 1 I I I I 1 I I I I I I I 
35 orf 137a MENMVTFSKIRPLLAIAAAALLAACGTAGKNAARKPVQTAKPAAWGLALGGGASKGFAH 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl37.Dep VG 1 1 KVLKENG I PVKWTGTS AGS I VGNLFASGMS PDRLELEAE I LGKTDLVDLTLSTNG 
40 I I I I I I I 1 I M II I I I I I I i I I 1 I I I I : ! I I I M II ! I I I » I 1 1 I I I I I I I I I I I I M : ! 

orf 137a VGIIKVLKEKGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

45 orf 137 .pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 

Mil MM Mill: I ~ 1 I I I I I I 1 I t 
0~fl37a FIKGEKLQNYTNRKVGGRRIQQFP I KFAAVATDFETGKAVAFNQGNAGQA VRASAAIPNV 

130 140 150 160 170 180 

The complete length ORF 137a nucleotide sequence <SEQ ID 56 1> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 



50 
55 
60 
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601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

111 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

5 801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 562>: 

l ME KMVTF5KI RPLLAIAAAA LIAA CGTAGN NAARKPVQTA KPAAWGLAL 

10 51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GOADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

15 301 * 

ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



20 



orfl37a.pep 
orfl37-l 
orfl37a.pep 
orfl37-l 



MENMVH'FSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAVVGLALGGGASKGFAH 
Ml || 1IMI MMlMMtMH 111! I I I I : M I I I I I I HI I I I I I I M i M H I I I 
MENMVT FS KIRPLLM AAAALIJACGTAGNN AVRKPVQTAKPAAVVGLALGGGASKGFAH 

VGI IKVLKENGI PVKWTGTSAGS I VGSLFASGMSPDRLELEAE ILGKTDLVDLTLSTSG 
MillliniltllillllitllMtlilllllilMMIIIMIIIIIIIIIMIIItl 
VGI I KVLKENG I PVKWTGTSAGS I VGSL FAS GMSPDRLELE1AE ILGKTDLVDLTLSTSG 



o-*137a pep FIKGEKLQNYINRKVGGRRIOQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
or f 1 37 - 1 FIKGEKLQNYINRKVGGRQIQOFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAI PNV 

orfl37a pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 

30 I I I I I I I I I 1 1 I I 1 1 1 1 1 1 1 1 I I M M I 1 1 1 I I 1 1 I 1 : 1 1 I I I I I 1 1 I I MM II 

orf 137-1 FQpviIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISOGFFSYLDQTLNV 

orfl37a oep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 
I I I I I Ml II M I I II 1 1 II I I M I I M M I I I II II I MM M II M I II I II M II M 
35 orf 137-1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Homology with a predicted ORF from N gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N. gonorrhoeae: 

40 o-fl37 pep MENMVTFSKIRPLIAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGIAIX3GGASKGFAH 60 

U * PP i i 1 1 | | | | 1 | t : 1 1 1 1 1 1 1 1 1 1 1 M I II I M I II II 11 M 1 1 I : I M I II I M I M I 

orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 

orfl37 Pep VGIIKVIJ^NGIPVKVVTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 
45 : | | : | 1 1 1 | | | 1 1 1 1 1 1 1 1 1 1 1 1 1 I M : I : M M M M I M M I I I M 1 1 M II M H : I 

orfl37ng iGIVOTLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

orfl37.pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 149 
1(11 I I I I I I I I I : t II I t I I II I ) i 
50 orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGATCATTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGTAC GGCGGGAAAC AATGCCGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGC TTTGGCACTC 

55 151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT ATAGGAATTG TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTG GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AGATTTTAGG TAAAACCGAT TTAGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

60 401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCCACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 
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501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

5 701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

1 0 This encodes a protein having amino acid sequence <SEQ ID 564>: 

1 MENMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

15 201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 

or f 1 37ng MENMVTFSKIRSFIAIAAAALU^CGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 

20 * i I I I I I I t 1 I I : I M I I I M I I I M I I I I I I : M i I I I I ! I I I I i : I I II I I I I I I I I i 

orf 13*7-1 MENMVTFSKI RPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orfl37ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
: I i : I I I I I I 1 1 I 1 1 I M i M I 1 I I M I I : I I I M I I I I I I I I M M M I M I t M I I I I 

25 orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
I I I I I I I I I I i I I 1 1 I I I I M I M I I i I I I II I M II I I I I I M I I I I I I I I I I I I II I I 

orf 137-1 FI KGEKLQN YINRKVGGRQI QQ FP I KFAAVAT D FETGKAVAFNQGN AGQAVRAS AAI PNV 
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orfl37ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 
I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 ! I I I I I 1 I I I I I I i : i I :: M I I I I I I I I I I I 
orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISOGFFSYLDQTLNV 



35 orfl37ng MSVSVLQNELGQADWI KPQVLDLGAVGG FDQKKRAI RLGEEAARAALPE I KRKLAAYRY 

I | | | : I | II I I I I I I I I I I I I I I I I i I 1 I I I I I M I t I I II I 1 I II I I I 1 I I I I I I I I I I 
or f 1 37 M S V S ALQN ELGQAD W I K PQVL D LG AVGG FDQKKRA I R LGEEAARAAL PE I KRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from ^meningitidis and 
40 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 68 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

45 51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

50 301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC . . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 

1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
55 101 MFKAVHGWEH VQQALDKHEG LLF 
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Further work revealed the complete nucleotide sequence <SEQ ID 567>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

5 151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

10 4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

4 51 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

15 651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

"7 51 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCA7GATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

20 This corresponds to the amino acid sequence <SEQ ID 568; ORF138-l>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

25 201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitid is (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A of M 
30 meningitidis: 

10 20 30 40 50 60 

orf 138 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGKRLGHLAFYLLKEDRARIVAX 

1 | | | | | | I | I I I 1 I I I I t I I t I I 1 I t t I I 1 t 1 I I t I I I t I I t I t I I I I I f I I 1 I I I 1 I I 
orf 138a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
35 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
| 1 1 | | I I 1 I I t i I I I 1 I I 1 t 1 I I 1 I I I I I I I 1 I 1 I I I I I t I I I I I I I I I I I 1 M I I I 1 1 I 
40 orf 138a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

70 80 90 100 110 120 

orf 138. pep LLF 

4 * orf 138a LLFITPHIGSYDIX^RYISQQLPFPLTAMYKPPKIKAIDKIMQAGRTOGKGKTAPTSIQG 

130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ ED 569> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

50 51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

55 301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

60 551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 
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601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

5 801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 570>: 

1 MFRLQFRLFP PLRTAM HILL TALLKCLSLI PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

10 101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLKIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 

15 orf 138a . pep MFRLQFRLFPPUITAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

| | | | I | I I I ] I I I I I I I I I I M I I I I I I I I I I II I I II I I I i I I I I I I I I I 1 I I I I I I I I 
orf 138-1 M FRLQFRL F P PLRT AMH I LLT ALLKCLS LL P L S CLHT LGN R LG H LAFY L LKEDRAR I VAN 

orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
20 M I I I : I I I I M I I I I I 1 1 I I I I I I I I I I H I I I I I I I 1 I I I ! I I I M I M I I I I II I I I 

orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a. pep LLFITPHIGS YDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I ! I I I M I 1 M I I I 1 I I I I I I I I I ! I I i I 1 I i I I I I I I I I II I I I M I I I II I I I 
25 o^f 133-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
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orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I | I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I It II M I I I IM 1 I M I 1 I I M I I I I I I 
orf 138-1 WQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYT^IAAKIAHVKGVKTLFF 

or:J13Ba . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I I I I I ! I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I 
orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 



35 Homology with a predicted ORF from ^gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 

N. gonorrhoeae: 

orf 138 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 60 
I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
40 orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 60 

orfl38 pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 120 

I I I I I I I I I : I I I I I I I I I I I I I I I I i I I I : I I I I I M I I I I I I 1 I I 1 M i I I I I II 
orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQOALDKGEG 120 
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orfl38.pep LLF 123 
I I I 

orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 180 



The complete length ORF138ng nucleotide sequence <SEQ ID 571 > is: 

50 1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATG CG 

55 251 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 

351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 

451 AAGCCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

60 501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GTCAAACAAA 

551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 
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601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC i 

751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

5 801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAMH TLL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

10 101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

15 orf 138-1 .pep MFRI^FRLFPPLRTAMHILLTALLKCI^LLPI^CLHTI^NRLGHIAFYLLKEDRARIVAN 

|| | | | | | | I | | | | | || I II I I I I II I I I 1 I I I I I I I I I I I I I I I I II I I I I I I I M I I I 
orf!38ng MFRLQFRLFP PLRTAMHI LLTALLKCLS LLSLSCLHTLGNRLGHLAFYLLKEDRARI VAN 

o^f 138-1. pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

20 " | | | | | I | I I : I I I I I I I I I I I I I I I I I I I I : I I II I I I I I I I I I I I I I I I I I 

orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

orf 138-1 pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
t 1 1 1 I I I I I I I 1 I I I I I I I I I t 1 I MIMMIIIMIIIMiliMlimiM!:IM 
25 orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

orf 138-1 pep VKQIIKAIilSGEATIVLPDHVPSPQEGGEGWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
||||||i!|:|l!lf:lllltllilltl I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl38ng VKQI I KALRAGEAT 1 1 LPDHVPSPQEGG-GVWADFFGKPAYTMTLAAKLAHVKGVKTLFF 
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orfl38-l pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
IMIII IIH MlilMIIIII:lllllllllll:IIIMIIIIIIIIIIHM I 
orfl38ng CCERLPDGQGFVLHIRPVQGELNGNKAHDAAVFNRNTEYWIRRFPTQYLFMYNRYKTP 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescein: 

35 gnl|PID|e334283 (Y14568) htrB [Pseudomonas fluorescensl Length - 253 

Score - 80.8 bits (196), Expect - 9e-15 

Identities - 49/151 (32%), Positives « 79/151 (51%), Gaps « 6/151 (3%) 

Query 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 
40 ~ + + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 

Sbjct: 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI 1 FYRPPKLKAVD 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQIIKALRAGEATIILPDHVPSPQEGGGVWADFFGKPA 219 
++++ RV+ K A + +G+ +IK +R G I D P P E G++ FF A 
45 Sbjct: 151 ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD — PEPAESAGIFVPFFATQA 208 

Qu^ry: 220 YTMTLAAKLAHVKGVKTLFFCCERLPDGQGF 250 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 

50 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF138-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
5 5 shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for ELIS A (positive result) and FACS analysis 



BNSOCCIO: <WO_992457HA2_l_ > 



WO 99/24578 



-329- 



PCT/IB98/01665 



(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 

Example 69 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 573>:- 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 

401 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

451 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG . . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 

1 . . AWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAA PARR SAW 

51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 

101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 

151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVL. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 



1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGT7GGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGG7CG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 



1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FG APGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGGSRYAT VEVEIYQLVM tttt.DMAV ASV LVWLVLGVTA AAGLL YAWF6 

251 RRAVSDKAVS PVMPSPPQSV CTWLIAF AA AVLSVCCLFP LLAlW^^ 

301 AGESWRVLME SETWQAVWNT T.RFS AAAVYA AAVLGWYAA AARRSAWMRG 

35! LM FLPFMVSP VCVSAGVLLL VPOWTAS LPL L LAM Y ALLAY PFVA KQVLSA 

5 401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAWLTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N meningitid is (strain A) 
10 ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A of M 
meningitidis: 

10 20 30 

orfl39 pep AWS AGESWRVLME SETWHAVWNTLRFS AAA 

I I I I I I I II 1 I 1 I I 1 II : I I I I I I I I I I I 
15 or f 139a OSVGEYVLLAF AAAVXSVCCLFXLLAIW KAWSAGESWRVLMESETWQAVWNTXRFSAAA 

270 280 290 300 310 320 

40 50 60 70 80 90 

or fl39 pep VYAAAVU^^AAPA RRSAWMRGIJi FXPFMVSPVCVSAGVLLL YPQWTAS LPLLIA^AL 

20 1 1 1 1 1 1 1 1 1 1 1 j i illinium i! mi minimi inn mini mi 

O r f 1 3 9a VYAAAVLGWYAAA ARRSAWMRGLM FLPFMVS PVCVSAGVLLL X PQWT AS LPLLLAMYAL 

330 340~" 350 360 370 380 

100 110 120 130 140 150 

25 orf 139 .pep LAYPFVA KDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

TTTTTTTi um 1 1 1 1 1 u I u 1 1 1 1 1 u 1 1 1 1 1 1 u 1 1 n I m u 1 1 u 1 1 1 1 u I 

or fl39a LAYPFVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
390 400 410 420 430 440 

30 160 170 180 189 

orf 139 .pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 
M I I I I 1 I II llllllllllll llll Illlliill 
orf 139a c;EFAATLFXSRXEWQTLTTLIYAYXGRAGXDNYARA MVLTLLIAAFALGXFLLL DGGEGG 
450 460 470 480 490 500 

35 The complete length ORF1 39a nucleotide sequence <SEQ ID 577> is: 

1 ATGGATGGAC GGCGTTGGGC GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGCAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

40 201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 

351 GTGGCGCGGC TGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTTNACCT TCCTGTGTTG GTCAGGGCGG CATATCAGGG GTTTGTGCAA 

45 451 GTGCCTGCGG CACGGCTTCA GACGGCACNG ACATTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTNGTGTGGC 

50 701 TGGTGTNGGG GGTAACNGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 

851 CTGTGTGCTG CCTGTTTCNT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

55 951 GTGGAATACT NTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT NATCCGCAGT GGACGGCTTC GTTGCCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

60 1201 TGNGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCNTGTC GCGTCNCGAG TGGCAGACGC TGACGACTTT 



BNSOOCID: *WO 992457BA2LL> 



WO 99/24578 



-331- 



PCT/IB98/01665 



1401 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 
1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 
1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 



1 MDGRRWAVWG AFALLPSAFL 



10 



15 



51 RLAWTVFQAA 

101 LVAGVGVLAL 

151 VPAARLQTAX 

201 LLLGGSRYAT 

251 RRAVSDKAVS 

301 AGESWRVLME 

351 LM FLPFMVSP 



ATCVLVLPLG 



&AMWAPLWA VAAYDGLAWR AVLSDAYMLK 
"VPVAWVLARL AFPGRALVLR LLML PFVMPT 



401 XDALPPDYGR 
451 AATLFXSRXE 
501 LDGGEGGKRT 



FGADGLXWRG 
TLGAGAWRRF 
VEVEIYQLVM 
PVMPSPPQSV 
SETWQAVWNT 
VCVSAGVLLL 



AAAGLGANGF 
WQTLTTLIYA 
ETL* 



WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 
WDIEMPVLRP WLAGft VrT.VF LYCFSGFGLA 
FELDMAV ASV LVWLVXGVTA AAGLL YAWFG 
GEYVLLAF AA AVXSVCCLFX LLAIWKAWS 
XRFS AAAVYA AAVLGWYAA A ARRSAWMRG 
XPOWTAS LPL LLAMY ALLAY PFVAK DVLSA 
QTACRITFPL LKPALRRGLT LAAATCVGEF 
YXGRAGXDNY ARAM VLTLLL AAFALGXFLL 



ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 



20 



25 



30 



35 



40 



45 



50 



orfl39-l 



orfl39a.pep 



orf 139a . pep MDGRRWAVWGAFALLPSAFLAAMWAPLWAVAAY DGLAWRAVLS DAYMLKRLAWT VFQAA 

P | | I | | | : ! I | I I I | I I I I I II : I I I I II I I I I I II I I i I I I I I I I I I I I I I I I i I 

MDGRRWVVWGAFALLPSAFLAVMVVAPLWAVAAY DGLAWRAVLS DAYMLKRLAWT VFQAA 

ATCVLVLPLGVPVAWVIARIAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 
M I I 11 I I I M M I I I I I I I I M I I I I I I I 1 I M 1 I i I I I I i i M I i II I I I li I I III 
orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orf 139a pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 
MIIIMlllltll lllltMIMIIIIIIHMm I I 1 I 1 t I I 1 I 1 1 I I 1 I I I I I 
orf 139-1 RODTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

or f 1 3 9a . pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYOLVMFELDMAVASVLVWLVXGVTA 

| | I 1 1 t I I I 1 I 1 1 I 1 1 1 1 I I I I I I I I 1 1 I 1 1 i 1 1 1 1 1 I I MIIIIMIII INI 

orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDb4AVASVLVWLVLGVTA 

orf 139a. pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 

H | 1 1 1 1 t 1 I I I I I I i 1 1 I I I I 1 t I I 1 I I 1 I I i I 1 I I I I I I I IIMMMII 

orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

or f 1 39a pep AGESWRVLMESETWQAVWNTXRFSAAAVYAAAVl^WYAAAARRSAWMRGLMFLPFMVSP 
| f | | 1 t I I I f I I I 1 I ! I 1 II IIIMIIIIIIlliltlllMIIMIIIlllMIIIIII 
or f 1 3 9 - 1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVU3WYAAAARRSAWMRGLMFLP FMVS P 



orf 139a. pep 

orfl39-l 

orfl39a.pep 



orfl39-l 



orf 139a. pep 
orf!39-l 



VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 

illlllllll I I I I I I M i M I M I I I H I I I I I 1 I I t I I I I I I I II I I I I I I I I I I I 
VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 

iiMmiimiiiiiiMiitiiinmim ii minium mi m 

QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 
ARAMVLTLLLAAFALGX FLLLDGGEGGKRTETLX 

mmiimimi mimmmm! 

ARAMVLTLLLAAFALGX FLLLDGGEGGKQTETLX 



Homology with a predicted ORF from N gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 



55 N. gonorrhoeae: 

orfl39.pep 



60 



orfl39ng 
orf 139. pep 
orfl39ng 



AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

I I I I M 1 I I I III I I I : I Ml III ill I I 

QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVUffiSETWQAVWNTLRFSAAA 327 

VYAAAVLGWYAAPARRS AWMRG LM FX P FMVS PVCVS AGVLLLY PQWTASLPLLLAMYAL 9 0 

Ml r I I I I I r I 111111111111111111 I I I I I I I I I I II I I 

VFAAAVLGVVYAAAARRLVWMRGLVFLP FMVS PVCVS AG VLLLYPGWT AS LPLLLAMYAL 387 
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or f 1 39 pep LAYPFVAKDVLSAWDALPPDYGRAMGLGANGFQTACRITFPLLKPALRRGLTIAAATCV 

<F 1 M f M ! 11 } | | | | f | M M I t M f I I M I | t | | | i | | I I 1 I M I 1 I I M f I t 1 I 

orfl39ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKFALRRGLTLAAATCV 

orf 139 .pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 
I I I I I I I I I I I I I I I I I I i I I I I I I M M I I It I I t I 1 I 
orfl39ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 



150 
447 
189 
507 



The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to.encode a 
protein having amino acid sequence <SEQ ID 580>: 



1 MDGRCWAVRG 

51 RLAWTVFQAA 

101 LVAGVGVLAL 

151 VPAARLQTAR 



AFSLLPSAFL 
ATCVLVLPLG 
FGADGLLWRG 
TLGAGAWRPF 



AVMWAPLWA 
VPVAWVLARL 
RQDTPYLLLY 
WDIEMPVLRP 



201 LLLGGSRYAT 

251 RRAVSDKAVS 

301 AGESRRVLME 

351 LVFLPFMVSP 

401 WDALPPDYGR 

4 51 AATLFLSRPE 

501 LDNGEGGKRT 



VEVEIYQLVM 
PVMPSPPQSV 
SETWQAVWNT 
VCVSAGVLLL 
AAAGLGANGF 
WQTLTTLIYA 
ETL* 



FELDMAGASA 
GEYVLLAFSV 
LRFSAAAVFA 
YPGWTASLPL 
QTACRITFPL 
YLGRAGEDNY 



VAAYDGLAWR 
AFPGRALVLR 
GNVFFNLPVL 
WLAGGVCLVF 
LVWLVLGVTA 
AVLSVCCLFP 
AAVLGVVYAA 
LLAMY ALLAY 
LKPALRRGLT 
ARAMVLTLLL 



AVLSDAYMLK 
LLMLPFVMPT 
VRAAYQGFAQ 
LYCFSGFGLA 
AAGLLYAWFG 
LSAIWKAWS 
AARRLVWMRG 
PFVAKDVLSA 
LAAATCVGEF 
SAFAVCIFLL 



Further work revealed a variant gonococcal DNA sequence <SEQ ED 58 1>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGGATGGAC 

GGCTTTTTTG 

ATGACGGTTT 

CGTTTGGCGT 

GCCTTTGGGC 

GGCGGGCTTT 

CTGGTGGCGG 

GTGGCGCGGC 

TTTTCAACCT 

GTGCCTGCGG 

GCGGCGGTTT 

GCGGCGTGTG 

TTGCTGTTGG 

GTTGGTTATG 

TGGTGTTGGG 

AGGCGCGCGG 

GCAATCGGTG 

CCGTGTGCTG 

GCCGGCGAAT 

GTGGAATACt 

TGGGTGTGGT 

CTGGTGTTTT 

GCTGCTGCTT 

TGTATGCGCT 

TGGGATGCAC 

AAACGGCTTT 

CGTTGCGGCG 

GCGGCAACCT 

GATTTATGCC 

TGGTGTTGAC 

TTGGACAACG 



GGTGTTGGGC 

GCGGTAATGG 

GGCGTGGCGC 

GGACGGTGTT 

GTGCCTGTCG 

GGTGCTGCGC 

GCGTGGGCGT 

CGGCAGGATA 

GCCCGTGTTG 

CACGGCTTCA 

TGGGACATTG 

CCTTGTCTTC 

GCGGCAGCCG 

TTCGAACTCG 

GGTAACGGCG 

TTTCGGATAA 

GGGGAATATG 

CCTGTTTCCT 

CGCGGCGTGT 

ttGCGCTTTT 

GTATGCGGCG 

TACCGTTTAT 

TATCCGGGGT 

GCTGGCGTAT 

TGCCGCCGGA 

CAGACGGCAT 

CGGTCTGACT 

TGTTCCTGTC 

TATTTGGGGC 

ATTGCTGTTG 

GCGAAGGCGg 



GGTACGGGGT 

TCGTTGCGCC 

GCGGTGCTGT 

TCAGGCGGCG 

CGTGGGTGCT 

CTGCTGATGC 

GCTGGCTCTG 

CGCCGTATCT 

GTCAGGGCGG 

GACGGCACGG 

AAATGCCCGT 

CTGTATTGTT 

TTATGCCACG 

ATATGGCGGG 

GCGGCAGGGT 

GGCGGTTTCC 

TATTGCTGGC 

TTGTCGGCAA 

GTTAATGGAA 

CGGCGGCGGC 

GCGGCGCGGC 

GGTGTCGCCG 

GGACGGCTTC 

CCGTTTGTGG 

TTACGGCAGG 

GCCGTATCAC 

TTGGCGGCGG 

GCGTCCGGAA 

GTGCGGGTGA 

TCGGCATTTG 

aaaACGGACG 



GCTTTTTCCC 

TTTGTGGGCG 

CGGATGCCTA 

GCAACCTGTG 

GGCGCGGCTG 

TGCCGTTTGT 

TTCGGGGCGG 

GTTGTTGTAC 

CGTATCAGGG 

ACGTTGGGCG 

TTTGCGCCCG 

TTTCGGGGTT 

GTCGAAGTGG 

GGCTTCGGCG 

TGCTGTATGC 

CCCGTGATGC 

ATTTTCGGTG 

TTGTTGTGAA 

AGTGAAACGT 

GGTGTTTGCG 

GGCTGGTGTG 

GTTTGTGTTT 

GTTACCGCTG 

CAAAAGATGT 

GCGGCGGCAG 

GTTCCCCCTC 

CGACGTGTGT 

TGGCAGACGT 

GGACAATTAT 

CGGTGTGCAT 

GAAACGTTAT 



TGCTGCCTTC 

GTGGCGGCGT 

TATGCTCAAA 

TGCTGGTGCT 

GCGTTCCCGG 

GATGCCCACG 

ACGGGCTGTT 

GGCAATGTGT 

GTTTGCTCAA 

CGGGGGCGTG 

TGGCTTGCCG 

CGGGCTGGCA 

AAATTTACCA 

CTGGTGTGGC 

GTGGTTCGGC 

CGTCGCCGCC 

GCGGTGTTGT 

AGCGTGGTCG 

GGCAGGCAGT 

GCGGCGGTTT 

GATGCGCGGA 

CGGCGGGCGT 

CTGCTGGCGA 

TTTATCGGCC 

GTTTGGGCGC 

TTGAAACCGG 

GGGCGAATTT 

TGACGACTTT 

GCGCGGGCAA 

TTTCCTGCTG 

AA 



This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 



1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLP FVMPT 

101 LVAGVGVLAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMPVLRP WLAG GVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

351 LV FLPFMVSP VCVSAGVLLL YPGWTASL PL LLAMY ALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAM VLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 



orfl39ng 
orfl39-l 
orf 139ng 



PCT/IB98/01665 

WO 99/24578 

-333- 

ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

MDGRCWAVRGAFSLLPSAFIA^ 

Mil |:f I ■ ■ . ■ | i i I t | 1 | I I I 1 1 ) I t t J t 1 1 I I I ! 1 S t I I t I I f 1 I « I I 1 1 1 1 I I I 
M DGRRW WWGAFAL L PS AFLAVMWAP LW AV AAY DG LAWRAVLS D A YMLKR LAWT V FQAA 

ATCVLVLPLGVPVAWVU^IAFPGRALVLRLI^LPF^PTLVAGVGVljaFGADGLLWRG 

t iiiiiimiiii iiiiiiM ii hi ' iiiiiniiini ii i in mm i inn' 

orf 139-1 ATCVLVLPWVFVAWVUUUAFPGR^^ 

10 orfl39nq RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARI^TARTLGAGAWRREVDIEMP\^RP 

orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

«rfl39na WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 

15 MiiiiiMimimiiiiimiMimmiiiMmm 1 1 : 1 1 1 1 1 1 1 1 1 1 

orf 1 39- 1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orfl39na AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 
9 M 1 1 1 1 I I I I M I M I 1 1 1 I 1 1 1 M I 1 M I 1 1 1 1 1 1 1 1 :: M I I I I I I I I I M I I 1 1 1 1 

20 orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orfl39na AGESRRVI^ESETWQAVWNTLRFSAAAVFAAAVLGVVYAAAARRLVWMRGLVFLPFMVSP 
9 | I I | M I ! I I I M M 1 M I I II I I I I 1 : 1 M I 1 I M M 1 I I M : I III I : I I I Ml 1 1 I 

orf 139 AGESWRVI^SETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 

orfl39na VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

9 1 1 ii ii i mi i iiMimiiiimiiimiuiiummiiiiiiiiiim 

orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
^0 orfl39na qtaCRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

^ U or 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 m 1 1 1 im siim 

orf 139-1 qtacritfpllkpalrrgltlaaatcvgefaatlflsrpewqtlttliyaylgragedny 

o r f 1 3 9 nq ARAMVLT LLLS AFAVC I FLLLDNGEGGKRTETL 

35 " ? I I M I I I I I : I I I r I II II 1:1 II I Cllll 

orf 139-1 ARAMVLTLLLAAFALGIFLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
^meningitidis and N, gonorrhoeae y and their epitopes, could be useful antigens for vaccines or 
40 diagnostics, or for raising antibodies. 

Example 70 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

45 10 l ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 

50 1 MDGWTQTLSA QTLLGISAAA IILILILIVR FR1HALLTLV IVSLLTALAT 

51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

55 101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 
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10 



15 



20 



201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 
GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 
GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 
CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGTCT 
GTCAACCAAA 
GTTCGCCATC 



CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 
ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



25 This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 



GLPTGSIVND ILVKNFGGTL 
IRMFGEKRAP FALGVAS LIF 
FALASIGAFS VMHV FLPPHP 
SGYMLGKVLG RTIHVPVPEL 
IFLNTGVSAL ISEKLVSADE 
RGESGSALEK TVDGALAPVC 
DLG IPVLLGC FLVALALRIA 
CIVLATAAGS VGCSHFNDSG 
FALSALLFAIV* 



GGVALLVGLG AMLGRLV ETS GGAQSLADAL 
GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 
GPIAASEFYG ANIGQVLILG LPTAFITWYF 
LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 
TWVQTAKIIG S TPIALLISV LVALFVLG RK 
SVILITGAGG MFGGVL RASG IGKALADSMA 
QGSAT VALTT AAALMAPAVA AA GFTDWQLA 
FWLVGRLLDM DVPTTLKTWT VNQTLIALIG 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF 140a) from strain A of N. 
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45 



50 



meningitidis: 



orf 140. pep 
orfl40a 



orfl40.pep 
orfl40a 



10 20 30 40 50 60 

MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTAXATG LPTGSIVKD 
I I 1 I I I t I I I I t I 1 I I I 1 I I I I I t I I I I 1 1 I I I 1 1 I I I I f t !! I 1 I t i 1 I 1 1 I I 1 I I I : I 
MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 

70 80 
ILVKNFGGTL GGVALLVGLGAMLERLV 
: I I I I I I I I I I I I I I I I I M I » I HI 

vxvknfggtu;gvallvglgamlgrlvetsggaqsladalirmfgekrapfalgvasli f 

70 80 90 100 110 120 



The complete length ORF 140a nucleotide sequence <SEQ ID 587> is: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGGACGGCT 
GGCGGCGGCA 
ACGCGCTGCT 
GGTTTGCCCA 
CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 



GGACACAGAC 
ATCATCCTCA 
GACACTGGTC 
CAGGCAGCAT 
GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 



GCTGTCCGCG 
TTCTGATTTT 
ATCGTCAGCC 
TGTCAACGAC 
CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 



CAAACCCTGT 
AATCGTCAAA 
TGCTGACGGC 
GTACTGGTCA 
CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 



TGGGCATTTC 
TTCCGCATCC 
TTTGGCAACC 
AAAACTTCGG 
GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
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5 01 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC G^GAACATCG 

55 1 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACG^AAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 
1CGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 



1001 OWV? A A 1 < «W«.J lV,^uuv - • " - ' _ ^ ^ „ . 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

U51 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

l'Ol TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 588>: 

1 MDG WTOTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

S1 m.PTKSTVNn VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 ^A LASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

^51 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

•>01 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

3S1 DLG IPVLLGC FLVALALRIA OGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 

orf 140-1 pep MIX3WTQTLSAQTLI^ISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

" p p 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 m i n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii imi ii -M An 

orf 140a M DGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

orf 140-1 oeo I LVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAP FALGVASLIF 120 

:|||!ltlinillllll!IMIIIMIMIIIIIIIMIMMIIIIIMMlllll!l 
orf 140a VLVKN FGGT LGG VALLVG LGAMLGRLVET S GGAQS LADALI RM FGEKRAPFALGV AS L I r 120 

orfl40-l Pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 

P I I I II I I I M I I ! I II I I I I M I I M II U I I I 1 I I I I I I I I I M i I I I I M I I I I I II I 

orf 140a GFPIFFDAGLIWLPIVFATARRl^QDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 

orf 140-1 oep AN1GQVLI LGLPTAFITWYFSGYMLGKVLGRT IHVPVPELLSGGTQDN DLPKE PAKAGTV 24 0 

I 1 1 ill 1 I 1 1 1 1 M I i 1 1 1 ! 1 1 M I I I I I 1 1 I I I I I I 1 1 * i M I I I I M I 1 I I l I I I I I I 
orf 140a ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

orfl40-l oep VAIMLI PMLLI FLNTGVSALISEKLVSADETWVQTAKI IGSTPI ALLI SVLVALFVLGRK 300 
* P P | j | | | | 1 | | 1 M 1 I | ! I I I H I I I I I M I I M I I I M I I I I I M I M M I M I I I I I I I I 
VAIMLI PMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLI SVLVALFVLGRK 



orfl40a 



300 



orfl40-l oep RGESGSAIXKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 
' P P | | | | 1 | | | | 1 | | | I I 1 I I I I I 1 I I I I I 1 I I I I I 1 I t i 1 i I 1 I t I I t I 1 1 ! I I 1 1 ! I 1 I 1 1 
- orfl40a RGESGSALEKTVIXSALAPVCSVILITGAGGMFGGVIJIASGIGKAI^DSMADLGIPVLLGC 360 

orfl40-l oeo FLVAUUJIIAQGSATVALTTAAALI^PAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

' y 1 1 1 1 1 1 1 J 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 i « 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f J 

orf 140a EXVALAI*RIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

orf '40-1 pep FWLVGRLLDMDVPTTLKTWTVNOTLIALIGFALSALLFAIV 461 

1 I I 1 1 I I I I I I 1 I I I I 1 t I t I I 1 I I I I I I 1 I I I i I « t t t I I 
orf 140a FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

Homoloev with a predicte d ORF from ^gonorrhoeae 

ORF140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 
N. gonorrhoeae: 
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orfl40.pep 



MDGWTQTLSAQTLI£ISAAAIILILILIVRFRIHALLTLVIVSLLTAI*ATGLPTGSIVKD 
■ i i i i i i i . . . . i i i i t i i • I i i - i • i t i I I 1 I I I I < i 



60 



orf 140ng 




60 



orf 14 0. pep 




87 



orfl40ng 



120 



The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 

1 MDGRTOTLSA OTLLGISAA* T^LILILI VK FRIRALT.TT.v TASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVASLIF GFPIFFDAGL IVMLPIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAP AC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 59 1>: 



1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC G7TGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 



This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 



1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRL1DM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V* 



ORF140ng-l and ORF140-1 show 96.3% identity over461aa overlap: 



orf 140ng-l .pep MDGRTQT LSAQTLLG I S AAAI I LI L I L I VKFRI RALLTLVI AS LLT ALATGLPTGS I VN D 
Ml I I 1 4 1 1 I I 1 I I I 1 I I I I I 1 I I 1 t 1 I I I I I ^ I I I 1 I I I * t I I I I 1 I 1 I 1 I | I | 1 I I I 
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MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 

vlvknfggtlggvallvgu;amlgrlvetsggaqsladalirmfgekrapfapgvaslif 

1 I I M I I I I I I I I ( I I I I I I 1 I I M M I I I I I I I I I I I t I I I I I 1 N I I M 11 I 1 I I I 



10 



20 



or f 1 4 o- 1 ilvknfggtlggvallvglgamlgrlvetsggaqsladalirmfgekrapfalgvaslif 

orfl40nq-l pep gfpiffdaglivmlpivfatarrmkqdvlpfalasvgafsvmhvflpphpgpiaasefyg 

erf 140-1 gfpiffdaglivmlpivfatarrmkqdvlpfalasigafsvmhvflpphpgpiaasefyg 

orfl40ng-l.pep anigqvlilglptafitwyfsgymlgkvlgraihvpvpellsggtqdsdppkepakagtv 

orti4ung i. W ,,,,,,,,,,,, , , , | , | | | | M I : M I I 1 1 1 1 M I I II I : I M 

orf 140-1 anigqvlilglptafitwyfsgymlgkvlgrtihvpvpellsggtqdndlpkepakagtv 

1 5 or f 1 40nq-l . pep VAVMLI PMLLI FLNTGVSALI SEKLVSADETWVQTAKMIGST PVALLI SVLAALLV LGRK 

9 P P lliMMI Mill III II Mill Mil MM I II III: IHI I: IHIIII:ll: I III I 
or5140-l vaimlipmlliflntgvsaliseklvsadetwvqtakiigstpiallisvlvalfvlgrk 

orn40ng-l.pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 
0 9 P P | | ,| i | : | || M M I I II : I M II I III I I I I I M M M I I II M I II I M II M I M M 

orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

orfl40nq-l pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQIACIVIATAAGSVGCSHETJDSG 
9 ' W | | | | 1 1 | | | M I I II I M I II I M M I I I II M M II M I I M I I M M I I M M M II I 
25 orf 140-1 FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 

orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 
I M I I M I II I I I M I I I I I I I 1 I I I I ^ I I I I I i n 1 I 1 M 
orf 1 40-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

30 Furthermore, ORF140ng-l is homologous to an E.coli protein: 

ail882633 (U29579) ORF o4 54 [Escherichia coli] >gi!1789097 (AE000358) o454; 
Thas 454 aa ORF is 34%~ identical (9 gaps) to 444 residues of an approx. 456 aa 
protein GNTP_BACLI SW: P46832 [Escherichia coli) Length = 454 
Score = 210 bits (529), Expect - ie-53 ,„, nBJ ^a, 

35 Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 

Query 88 ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 

E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI*+ A+ K 
SbjCt: 80 EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFIILAPIIYGFAKVAKIS 139 

Query 148 VLPFALASVGAFSVMHVFLPPHPGPIAASEFYGANIGQVLILGLPTAFITWYFSGYHLGK 207 

L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 

Sbjct: 140 PLKFGLPVAGIMLTVHVAVPPHPGPVAAAGLLHADIGWLTIIGIAIS-IPVGWGYFAAK 198 

45 Ouerv 208 VLGRAIHVPVPELL SGGTQDSDP PKE P AKAGT WAVML I PM LL I FLNTG V 257 

yuery. ++ + + E+L G T+ SD P A V ++++IP+ +1 T 

SbjCt: 199 I INKRQYAMSVEVLEQMQLAPASEEGATKLS DKIN PPGVA-LVTSLI VI PIAI IMAGT 255 

Query 258 SALISEKLVSADETWVQTAKMIGSTPKXXXXXXXXXXXXXGRKRGESGSTLEKTVDGALA 317 
50 +S L+ + T ++IGS + RG S + AL 

SbjCt: 256 VSATLMPPSHPLLGTLQLIGSPMVALMIALVLAFWLLALRRGWSLQHTSDIMGSALP 312 

Query 318 PACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGCFLVALALRIAQGSXXXX 377 
A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 
55 SbjCt: 313 TAAWILVTGAGGVFGKVLVESGVGKALANMLQMI DLPLLPAAFI I SLALRASQG S AT 370 

Ouerv 378 XX^X^CXXXXXXXXXGFTDWQIJVCIVIATAAGSVGCSHFNDSGFWLVGRLLDMDVPTTLK 437 
Vy ' GQ+LAG+GSH NDSGFW+V + L + V LK 

Sbjct: 371 VAILTTGGIXSEAVMGLNPIOCVLVTLAACFGGLGASHINDSGFWIVTKYLGLSVADGLK 430 
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60 



Query: 438 TWTVNQTLIAFIGFALSALLFAIV 4 61 

TWTV T++ F GF ++ ++A++ 
Sbjct: 431 TWTVLTT I LG FTG FL ITWCVW AV I 454 



Based on this analysis, including the identification of the presence of a putative leader sequence 
65 (double-underlined) and several putative transmembrane domains (single-underlined) in the 
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gonococcal protein, it is predicted that the proteins from N. meningitidis and N .gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 71 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 593>: " 

5 1 . . GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 

51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 
101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 
151 AACTTTTTGG GCAGACACCA CGGGCGCAC. GTCGTCCTGA TTCTCATCGG 
201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 
10 251 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 

301 CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 
351 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCC7TGCCCG 
401 TACTGATGTT TTTCCGTCCG 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 

15 1 ..DTOXSPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 

51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

20 51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

25 301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

4 51 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

30 551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

35 801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

40 1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

45 1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

14 51 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

50 1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ED 596; ORF141-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

55 51 VEALAGSPTP LVAHLFGQTO fTGJPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRRLMLTAV ASLAFALPLM TVYPLLLAKT QPALFAQWLD 
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251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 
301 W GILGWWML AVLVTJ .AVNP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 
4 01 IPMAVAVLFT PLWLW AITRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
5 4 5i DAAKSHAPW RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENI* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Nmeninzitidis (strain A) 
10 ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of N. 

meningitidis: 

10 20 30 

orfl41 pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 

I I 1 I I I II I I I I I I ! I I I I I 1 I I I I I : I 
1 5 orf 1 41a WNPDEPAVYTAVEALAGSPTPLVAHLFGOIDFGIPPVYLWVAAAFKHLLSPWAADPYDAA 

40 50 60 70 80 90 

40 50 60 70 80 90 

orf 141 pep R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX WLILIGCIGLIPVAHF LNPAAAAFAAAGL 

20 ^ * * iTTTTTTTTTTTTTTTTTi i i 1 1 1 1 1 1 1 1 1 1 1 1 1 m i m 1 1 :: 1 1 1 1 1 u 1 1 n 1 1 1 1 

orfl41a R FAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFAAAGL 
100 UO 120 130 140 150 

100 110 120 130 140 

25 orf 141 pep VLHGYSLARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RP 

in 1 1 1 1 1 1 1 iTITTTTTTTTTTTTmT 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 

or f 1 4 la VIJiGYSIARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RPWQSRRLMLTA 
160 170 180 190 200 210 

30 orf 14 la VASLAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWF 

220 230 240 250 260 270 

The complete length ORF141 a nucleotide sequence <SEQ ED 597> is: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

35 101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

40 351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

4 51 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

45 601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

50 851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

55 HOI CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

60 1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 

1451 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 
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1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 



This encodes a protein having amino acid sequence <SEQ ID 598>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



MLTYTPPDAR PPAKTHFKPW LLLLMAFAWL WPGVFSHDLW 



VEALAGSPTP LVAHLFGQID 
FAGVFFAWG LTSCGFA GFN 
AAAFAAAGLV LHGYSLARRR 
LPVLMFFRPW QSRRLMLTAV 



DHVFGTFGGV RHIQTAFSLF 
W GILGWWML AVLVLLAVN P 
AFVNWFGIMA FGLFAVFLWT 
IPMAVAVLFT PLWLWAI TRK 
DAAKSHAPW RSMEASLSPE 
LPHRVGOVQC RYRIVRLPQN 
ENILKTTD* 



FGIPPVYLWV AAAFKHLLSP 
FLGRHHGRS V VLILIGCIGL 
VIAASFLLGT GWTLMSL AAA 
ASLAFALPLM TVYPLLLAKT 
YYLKNLLWFA LPALPLAVWT 
QRFQDNLVWL LPPLALFGAA 
GFFAMNYGWP AKLAERAAYF 
NIRGRQAVTN WAAGVTLTWA 
LKRELSDGIE CIDIGGGDLH 
ADAPQGWQTV WQGARPRNKD 



NPDEPAVYTA 
WAADPYDAAR 
IPTVHFLNPA 



YPAAFALMLP 
QPALFAQWLD 
VCRTRLFSTD 
QLDSLRRGAA 
SPYYVPDIDP 
LLMTLFLPWL 



TRIVWTQYGT 
SKFALIRKTG 



ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



f 141a . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDE PAVYTAVEALAGSPTP 
i I M I I I I I t t I M I I I I I I I I I I I I M I I I I I I I I 1 M I I M (I I I I I I I I t I I II I I i 
or f 14 1-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orfl41a pep LVAHLFGQI DFGI PPVYLWVAAAFKHLLSPWAADPYDAARFAG VFFAWGLTSCGFAGFN 
I I I I I I I I I I I I II II II I I I I I I I I II I I 1 1 I I I I H I I I M I I I : M I I t I I I II 1 
orfl41-l LVAHLFGQTDFGI PPVYLWVAAAFKHLLS PWAADS YDAARFAGV FFAV I GLTSCG FAG FN 

orf 14 la . pep FLGRHHGRSWLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I t I I I I E I 1 I I 1 I t 1 } I t I t I 1 1 : | ( 1 | | | | | | 1 1 1 1 I I I 1 t I 1 Ml II I III! HUM 
orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orf 14 la . pep GWTI^SU^YPAAFAIMLPLPVLMFFRPWQSRRI2aTAVASLAFALP^ 

in in ii in mi ill inn i ii 1 1 illinium mi mi 1 1 1 1 n 1 1 1 1 1 1 1 1 

orf 141-1 GWTIMSLAAAYPAAFAimPLPVI^FFRPWQSRRI^TAVASIAFALPLMT^^ 

orf 14 la . pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFAL PAL PLAVWT VCRTRLFSTD 
Hllltllll 1 1 I I I I 1 I I 1 1 r I 1 I 1 I I 1 I I 1 I I I 1 I I I I S I 1 ! 1 t f 1 t 1 1 1 1 I I I I I I 
or f 1 4 1 - 1 QPALFAQWLD YHVFGTFGGVRH VQTAFS LFYYLKNLLW FALPAL PLAVWTVCRTRLFST D 

orf 141a. pep WG I LG WWMLAVLVLLAVN PQRFQDN LVWLL PPLALFGAAQLD S LRRGAAAFVNW FG IMA 
I || II I I I II II I II I I i I II I I I I I II II I ! II I I I II I I I I II I I I I I I i I I I I I I I I 
orf 1 4 1- 1 WGILGWWMIAVL\n.LAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

or f 14 la . pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVLFT PLWLWAITRK 

in in inn in ii mil iimi ill ii inn mil ii 1 111 m ii inn in 

orf 14 1-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVLFT PLWLWAITRK 

orf 141a . pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 
1 1 1 1 1 1 1 III 1 1 M 1 1 M 1 1 1 1 1 1 1 I 1 1 1 M 11 ! 1 1 1 1 1 1 II I I I M M M 1 1> 1 1 1 1 M 

orf 1 4 1- 1 N I RGRQAVTNWAAG VT LTWALLMT LFL P WLD AAK SHAP WR S ME AS LS PELKRE L S DG I E 

orf 14 la . pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

ii imiiiiiiimtimmmimm 1 1 1 1 m in 1 1 1 m m m n 

or f 1 4 1- 1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 
orf 1 4 la . pep SKFALIRKTGENI 

1 1 1 ii ii i mi 

orf 1 4 1 - 1 SKFALIRKIGENI 

Homology with a predicted ORF from N. gonorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 



60 



N. gonorrhoeae: 
orf 141. pep 
orf 141ng 



DFG I SPVYLWVAAAFKH LLS PWAADS YDVA 30 

mi immimmiim mi 

WNPAEPAVYTAVEALAGSPTPLVAHLFGQTDFGI PPVYLWVAAAFKHLLS PWAAHPYDAA 126 
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orfl41 pep MWWmvi&TSW^ 90 
ortm-p^ in mm n mm I III II 11 1 II llll 1 1 1 I ! I I 1 1 M I - 1 1 1 1 1 1 i 1 1 1 1 1 

orfl41ng RFA^FFAVIGLTSCGF^ 186 

VLHGYSU^RRVIAASFLLGTGWTI^SIJ^YPAAFAI^LPLPVI^F^? 140 
| | | | | | | M | | | | | | | | | | II I I I I I I I I I I I I I I 1 I I I II I M II I I I I 

VLHGYSIAMIRVI 246 



orfl41.pep 



orf!41ng 



An ORFHlng nucleotide sequence <SEQ ID 599> was predicted to encode a protein haying amino 
acid sequence <SEQ ID 600>: 

in 1 MPSEAVSARP LCEYLLHLAI RPFLLTLMLT YTPPDARPPA KTH£KPWLLL 

51 LMAFAWLWPG VFSHDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

101 PPVYLWVAAA FKHLLSPWAA HPYDAARFAG VFFAVIGLTS CGFAGFNFLG 

151 PHHfiRS WLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

20 1 A SFLLGTGWT LMSL AAA YPA AFALMLPLPV LMFF RPWQSR RLMLTAVASL 

15 251 AFALPLMTVY PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAF NPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDTDP IPM AVAVLFTPLW LWAI TRKNIR 

451 GROAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

20 501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

95 101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG 7ATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

in 351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

4 01 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

4 51 gccqccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg CCtCtTtCCT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

35 eol CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

40 851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

lOO"" TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

45 uoi CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 
1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 
1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 
1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

50 1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 
1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 
1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 
1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

55 1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 

1 MLTYT PPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

60 101 FAGVF FAVIG LTSCGFA GFN FLGRHHGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRRL MLTAV ASLAFALPLM TVY PLLIAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 WGILGIVWML AVLVLLAFNP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
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351 AEVNWFG IMA FGT.FAVFLWT GFFA MNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAITRK NIRGRQAVTN WAAGVTLTWA LLMTLFLPWL 

451 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

_ 501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

5 551 ENILKTTD* 

ORF141ng-l and ORF141-1 show 97.5% identity in 553 aa overlap: 

orf 1 4 lng-1 . pep MLTYTPPDARPPAKTHEKPWL1J,LMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 
II I 1 I I I I 1 I I I I I I I I I I I M t I II I II I I I I I I I I 1 I I II I f I I I I I I I I I I I I I I I 
orf 1 41-1 MLT YT P P DAR P PAKTHEK PWLLLLMAFAW L W PG V FS H DLWN P DE P AVYTAVEALAG S PT P 

10 

orf 14 lng- 1 . pep LVAHLFGQT D FGI P PVYLWVAAAFKHLLS PWAADPYDAARFAGV FFAV IGLT SCG FAG FN 
I I I I I II I I I I I I I I I I I I I I I I I I I I t I I I II I | I I I I I I I M I I I II I I I I II I I M 
orf 14 1-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

15 orf 141ng-l .pep FLGRHHGRSWLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

MINIMI III! I I f 11 I 1 I 1 I I 1 I I I I I M M I I M I 1 1 I I I t I i I I M I M I 1 i I I 
orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 



orf 14 lng-1 .pep GWTI^SIJUtflYPAAFALMLPLPVIMFFRPWGSRRLMLTAVASIAFALPLOT 
20 I I I I I I M M I I I I I I I I i I I II I I I M M M I I M I I I I I II M I M I I II I I I II M I 

orf 141-1 GWTI^SLAAAYPAAFALMLPLPVLMFFRPWQSRRI^LTAVASLAFALPLMTVY^^ 

orf 141ng-l .pep QPALFAQWl^YHVFGTFGGWHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
M M II II I : I 1 I I I I I I I I II: I II II I : I I I 1 M I I I I : II M M II I I I I I I I M 
25 orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 14 lng-1. pep WGIIX5IVWM1JVVLVLIAFNPQRFQDKLVWLLPPIALFGAAQLDSLRRGAAAFVNWFGIMA 
M M I : I I I I I I M I I I I I I I I I I I M II 1 M I I I I I I I I I I M II II II I I I I I I II I 
orf 14 1-1 WGILGVVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

30 

orf 14 lng-1. pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDI DPI PMAVAVLFT PLWLWAITRK 
I I I I I M I I I I I M I I M I I I I I I I M II M I I I I I I I I I M I I I I I I I II I I I 1 1 I I I I 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDI DPI PMAVAVLFT PLWLWAITRK 

35 orf 141ng-l .pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPVVRSMEASFSPELKRELSDGIE 

I M M II Ml Mill IMIIMMIMI MM IIIIIIIIMIMI:| MIMIIIMM 
orf 141-1 N I RGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEAS LS PE LKRELSDG I E 

orf 141ng-l .pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
40 I II I M M I I I I I I I I I I I I I I I II M I : I I II I I I I M I I I II I I I II I I II I I I I I I 

orf 14 1-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orfHlng-l.pep SKFALIRKIGENILKTTDX 
I II M I I I I I I I I 
45 orfl4l-l SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 



50 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 603>: 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC ATTGAAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

55 This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 

1 ..CSAXWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SGFQVGYTF* 

Further work revealed the complete nucleotide sequence <SEQ ID 605>: 
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l 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGGATAATT 
TTTC7CTGCC 
ATGGACGTTC 
CGCAAAGAAG 
CGGTAAATGG 
CAGTTTCCGG 
ACTGATTTCG 
CTATCTCGGT 
ATGCCGAACT 
CTTTCCCACA 
ATATAAACGC 
CCTTTGGCGA 
GTAAATACTC 
CGTTCATGCA 
GTATCGGCGG 
TCTGCCGAGC 
ACCAGGCCAT 
AATCCGCCAA 
ATACGCGGGC 
CGGCCGCGCA 
GCGGTTTTCA 



CGGGTAGTGA 
GACAATCCTT 
CATTGGCGGT 
GCGGATCAAA 
ACATGGGCAT 
ATTATCGGAA 
GCTTCAACCG 
GTAAAACTGT 
GACTGTACAA 
AAGAATATAT 
GGCACCGGCA 
AGGCACGTCA 
CTTTTCAAAT 
CAATGGAACA 
ACACCACACC 
GGGGATGGTA 
CAGCTTTATC 
ATGGTTATCG 
AGATAAAGCT 
TTGAAAAAGC 
GGTAGGCTAT 



GGCGACAGGA 
TGGGACTGAG 
ACGCCCGATG 
CAATTACGCC 
TCAATCACAA 
GTCTATGACT 
CCTGTTGTAT 
GGATGAGGGA 
CGGCGTAAAA 
CGGTCGCAGT 
TGAAAGATGC 
CGTATGAAAA 
CGGTAAACAG 
AAACCCCGCT 
GTACGTGGCT 
TTGGCGCAAC 
TTGGGGCTGA 
GGCCAAACTC 
TGGCGGCAAC 
CCGAATTTTT 
ACGTTTTAA 



AAATACCAAG 
TGATATGTTC 
AGGAAAGTTT 
GTACATTATT 
TGGCTACCGT 
ATAATGGAAA 
CGTGATGCCA 
AACAAAAAGT 
CTGCGGGTTG 
ACGGCAGATT 
TCTGCGCGCG 
TTTGGACGGC 
CTATTTGCCT 
AACATCGCAA 
TCGACGGTGA 
GATTTGAGCT 
TGTAGGACAT 
TAGTCGGCAC 
CTGCATTACG 
CCAATCAAGG 



GAAATATCAC 
TATGTAAATT 
TGACGGCCAT 
CAGCCCCTTT 
TACCATCAGG 
AAGTTACAAT 
AACGCAAAAC 
TACATTGATG 
GTTGGCAGAA 
TTAAGTTGAA 
CCTGAAGAAG 
ATCGGCTGAT 
ATGACACATC 
GACAAACTGG 
AATGAGTTTG 
GGCAATTTAA 
GTTTCAGGAC 
AGCAATTGGG 
ATATATTTAC 
AAATGGGCAA 



This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 



1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYKGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

15i LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 WTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. gonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
JV. gonorrhoeae: 

orfl42 pep QSAKWLSGQTLVGTAIGIRGQIKLGGNLHY 30 

I II I I I II I I I : I I I I I I I I I I II I H H I 
orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orf 142 .pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

I I 1 II I 1 1 ! Ill : I I : : I I : : I I I Ml : I 
orfl42ng DIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 

ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1 .pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
| | 1 1 I I I I I I I I | I 1 I 1 I I I 1 I I t I I I I I I I I I I I I f I I I 1 I I I I ^ 1 1 I I I I I I I I I I I I 
15 orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 



20 



orf 142-1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 

I I I I I 1 1 1 1 1 1 I 1 1 1 1 1 1 1 I I I I I 1 lllltl) IMIMMIIIIIMIMIII: 

orfl42ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 

orf 142-1 . pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 

IN I Hill I III I I I I: I I II I Mil M II M M M II I I : M I M M I I I 

orf 1 4 2ng- 1 VKLWTRETKSYI DDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 

25 orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

M I M I M M M I M I M II II I I M M M I M I I I I I I I I I II I I I I II II M M M I I 
orfl42ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

orf 142-1 .pep VRGFDGEMSLSAERGWYWRNDLSWOFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
30 I I M II M II M M M I M I I I I II M M I I II I M I I I I M M I II I I I II 1 : 11 I I I 

orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orfl42-l .pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
I I M MM I M I I M I MM I I M I : II : : M::M IM I: I 
35 orfl42ng-l IRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 

qi 1 1772622 (L39897) HecB [Erwinia chrysanthemi ] Length « 558 
Score - 119 bits (295), Expect - 3e-26 

Identities « 88/346 (25%), Positives - 151/346 (43%), Gaps - 22/346 (6%) 

40 

Query: 2 DN SG SEATGKYQGN IT F S ADN P FGL S DMFYVN YGRS I GGT PDEEN FDGHRKEGG SNN Y AV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS— SRFATSHDAESLQAG 280 

45 Query: 62 H Y SAP FGKWTW A FN HN G YR YHQAV SG LS E VY D YN GKS YN T DFG FNRLL YRD AKRKT Y LS V 121 

+S P+G W +N++ RY + G S F +R+++RD KT ++ 

Sbjct: 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKS Y I DDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 
50 R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADWTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 
+ ++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

55 Sbjct: 400 DKSADEPRAEFNKWTLSASYYHPV TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 456 

Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RG F-REQYTSGNRGA YWRN E LNWQAWQL PVLGNVT FMAAVDGGKLYNHKQDN ST AAS LWG 515 



60 



Query: 297 TAIGIRGQIKLGGNLHYDI FTGRALKKPEYFQTKKWVTGFQVGYSF 342 
A+G+ + L+G + P + Q V G++VG SF 



WSOCC40: «WO_9924S7BA2J_> 
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Sbjct: 516 GAVGMTVASRW LSQQVTVGWPISYPAWLQPDTMVVGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins from ^meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

5 Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCTJACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

10 151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

15 i MRTKWSAVRS CTWADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN .. 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1 >: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

20 51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

25 301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

30 551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLL TWACTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

35 101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG I PDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain 
40 ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of K 
meningitidis: 

10 20 30 

or ? 143 Deo MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFL 
" W |: : III Ml III III III III II Ml 

45 orfl43a GAFYAVSSDXPSAGKTLIJISLLKADADEMVSSEKLLTWAXTADIDTAI^I^YRLQKLEFL 

20 30 40 50 60 70 

40 50 60 70 80 90 

orfl43 pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYIJ^ANFHHEAAEELGLLAAE 

50 n it i n in H i 1 1 it I ii i MiMiimiiMiiiiiiiiiiiiiiiiiii 
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YGDENGHSDGINLSDE0LPL1MEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
80 90 100 I 10 120 130 

100 HO 
VAQMEKKYRLLIKNN 

VAQMEKKYRLXI^LYINNNAH^ PDLGKEA 
140 150 160 170 180 190 



The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCA 
GAGAAGCTGC 
CCTGTTGTAC 
GTCATTCAGA 
GAACAATTGT 
GTATCTTGCC 
TGTTGGCGGC 
AAGAACAACC 
CGGTCAGAGC 
TTATTTTGGT 
ACTTTGGTAA 
TGGGAGAGAG 



CANTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TTACCTGGGC 
CGTTTGCAAA 
CGGCATCAAT 
CCGGCAGCGG 
AACGCCAATT 
AGAAGTCGCA 
TGTATATCAA 
GAATTGACAT 
TATCGGCGGC 
GGATNTTATA 
GANGGGTTAT 



ACAAGCAAAT 
TATCCAGCGA 
AAAGCGGATG 
GGANACCGCC 
AACTCGAATT 
TTGTCGGACG 
TAAGGCGTTA 
TCCATCATGA 
CAGATGGAAA 
CAATAACGCT 
TTTTCCCATT 
ATTCCCGATT 
CCNCCNGTTA 
GCAGCAATTA 



TTATATCNCC 
TGNCCCCAGT 
CGGACGAAAT 
GACATCGATA 
CCTCTATGGC 
AGCAATTGCC 
TTGGTCGATC 
GGCGGCGGAA 
AGAAATACCG 
TGGGGCGTTT 
GTATATCGGT 
TGGGCAAAGA 
CAGCAACCGC 
TTGA 



GCCTGACTCC 
GCCGGTAAAA 
GGTNAGCAGT 
CCGCTTTGAA 
GATGAAAACG 
GTTGCTGATG 
GGAACGGTCT 
GAGTTGGGGT 
GCTGCNNATT 
GCGATCCTTC 
TCAACCAAAT 
GGCATTTGTT 
GTGTAAAACT 



This encodes a protein having amino acid sequence <SEQ ID 614>: 



25 



l 

51 
101 
151 
201 



MESTXSLQAN LYXRLTPAGA FYAVSSDXPS 
EKLLTWAXTA DIDTALNLLY RLQKLEFLYG 
EQLSGSGKAL LVDRNGLYLA NANFHHEAAE 
KNNLYINNNA WGVCDPSGQS ELTFFPLYIG 



AGKTLLHSLL KADADEMVSS 
DENGHSDGIN LSDEQLPLLM 
ELGLLAAEVA QMEKKYRLXI 
STKFILVIGG I PDLGKEA FV 



TLVRXLYXXL QQPRVKLGRE XGLCSNY* 



30 ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 

orf 143a . pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 
Ml) II Hill I II HIM III M I I I 1 I I f 1 1 I 1 I I I I I I I 1 1 I I I I I I I i I * il 
orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 

orf 143a. pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

mMMiMmiiiiiiiiiiiiMMimiiiiiiiiiii mi mmmiMi 

orfl4 3-l DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 143a. pep NANFHHEAAEEIXSLIJU^VAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

MM! I M M II 1 M I M M I II IMMI II I I II I I M M 1 1 M I M I 

orf 143-1 NANFHHEAAEELGLIAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

orf 143a . pep STKFILVIGG I PDLGKEAFVTLVRXLY 
II II M I I II II II I I I M II I II II 
orf 14 3-1 STKFILVIGG I PDLGKEAFVTLVRILY 

Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a HOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 



35 



40 



45 



50 



55 



orf 14 3. pep 
orfl43ng 
orfl43.pep 
orfl43ng 



MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 

M I | || M M I : II I II M I i M I M I II I M M M I M I I M I M I IMMIMIM 

MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 
M I I II M II M M I M I M I I M : I M I I I M M I II I I I I I I I 1 M 1 I 

SGSGKA1J-VDRNGLYLAHANFHHESAEELGLIAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 



An ORF143ng nucleotide sequence <SEQ ID 615> was predicted to encode a protein having amino 
acid sequence <SEQ ID 616>: 
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1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT FFPLYIGSTK FILVIAGIPD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VT-VAKETGRL GLILLDAKRA ARHIAEA1* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 

1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

*51 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

401 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LT FFPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 
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orfl43nq-l pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEWSSEKLLA-ADTA 
IIIIIIIIHIII I I It t 1 t I J I I I I M i M II 1 1 : M I I 1 I I I I = I t I 1 1 1 1 : IMI 
orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orfl43nq-l pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

| 1 1 1 ! I 1 1 1 t 1 1 1 1 I I I 1 I 1 t I I t 1 1 I I 1 1 1 1 I t I I t I I I 1 I K I 1 1 I I I I I I I I f 1 I I I I 
orf 143-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPL3-MEQLSGSGKALLVDRNGLYLA 120 

orf!43nq-l pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNKAWGVCDPSGQSELTFFPLYIG 179 

I I 1 1 1 1 I s 1 1 1 1 1 1 1 1 1 I I I i M I I I II I 1 : 1 1 1 I I M I M I I 1 1 I 1 1 I 1 1 I H 1 1 1 1 1 I 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 180 

orfl43nq-l.pep STKFI L V I AG I P D L S KEA FVT LVR I L YRRY SNR V 213 

III! HI [:| 1111:1111 I i I I I IMIIMIM 
orf 143-1 STKFI LVI GG I PDLGKE AFVTLVRI LYRRYSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA. GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

401 CCGTGGATG. . 
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This corresponds to the amino acid sequence <SEQ ED 620; 0RF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 62 1>: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTGTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAC 
CCAAACCGCT 
AGCGTTTTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGACACCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 
CTACATCTAT 
CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACGGCAGTA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACGCTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 
GTCTATTGGG 
TTCCTTTATG 
CGCAGTGGTC 
CTTTTGCTGT 
GCGGCAGGCG 
CGCGCTCCCT 
ATTTACGGCG 
GTTGTGGACG 
ACTGGCAGGG 
GACGACGTGT 
CAAAGCCTTG 
ACGAGTTGGG 
TCCGGCAGAC 
GAACGAACTC 
ATCATGTGAA 
TTGAACATGA 
G 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCTGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTCACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 
GTACCGCAGG 
CCCCGTGCTG 
ACCGCTGGTC 
CAGGGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
TCCCAGCGTC 
GTTCGGGCCG 
TACAGGATGC 
CGAACGGCGG 
CCGCTTCGTG 
CTTTGGCAAC 
TATATGGGCA 
CGTGCCGTTT 
GCGGCGCGGT 
CGCAGGGGCT 
GCTGCTTCTG 
AGTTCAGACG 
GAAAAGCTGG 
GTTGAAAACG 
TCGTTTACCG 
GATGCGGTAA 
GTTTGACGCT 
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This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 
51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 
151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 
201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 
251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKI LLLL 
301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 
351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
4 01 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homolo|gv with a predicted ORF from N. meningitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of N. 
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50 



55 



60 



meningitidis: 



orf 144 .pep 
orfl44a 



orf 144 .pep 



orf!44a 



orf 14 4 .pep 
orf!44a 



10 20 30 40 50 60 

MTFLQRLQGIADNKICAFAW FWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
I I I I M I I I I I I I I I I t I I I 1 I I I I I I I I I I M I I I 1 I I I I I I I M I M I II I I I I I I I 
MT FLQR LQG LADN K I C AFA W FWRRFDEER V PQAAASMT FTT L LAL V P VLTVMVA V AS I F 

10 20 30 40 50 60 

70 80 90 100 110 120 

PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANR LTAIGSVMLWTSIJ^L IRTID 

in iii lit i m inn til i f 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 k 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 

PVFDRWS DS FVS FVNQT I VPQGADMVFDY I NAFREQANR LTAIGSVMLWTSXML IRTID 
70 80 90 100 110 120 

130 

NTFNRIWRVXXQRPWM 
Mill IN I Mill 

NTFNRIWRVWSQRPWKMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 
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130 140 150 170 

The complete length ORF144a nucleotide sequence <SEQ ID 623> is: 
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10 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTNTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAN 
CCAAACCGCT 
AGCGTTCTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGATGCCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 
CTACATCTAT 
CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACAGCAGCA 



GCAAGGTTTG 

GCCGCTTTGA 

ACGACACTGC 

TTCGATTTTC 

TCAACCAAAC 

AATGCGTTCC 

GCTGGTCGTT 

ACCGCATCTG 

GTCTATTGGG 

TTCCTTTATN 

CGCAGTGGTC 

CTTTTGCTGT 

GCGGCANGCG 

CGCGTTCCCT 

ATTTACGGNG 

GTTGTGGACG 

ACTGGCAGGG 

GACGACGTGT 

CNAAGCCTTG 

ACGAGTTGGG 

TCCGGCAGAC 

GAACGAACTC 

ATCATGTGAA 

TTGAACATGA 

ATCTTGA 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCNGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTTACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 

GTACCGCAGG 

CCCCGTGCTG 

ACCGNTGGTC 

CAGGGCGCGG 

GAACCGGCTG 

TGCTGATTCG 

TCCCAGCGTC 

GTTCGGGCCG 

TACAGGATGC 

CGAACGGCGG 

CCGCTNCGTG 

CTTTGGCAAC 

TATATGGGCA 

CGTGCCGTTT 

GCGGCGCGGT 

CGCAGGGNCT 

GCTGCTTCTG 

AGTTCAGACG 

GAAAAGCTGG 

GTTGAAAACG 

TCGTTTACCG 

GATGCGGTAA 

GTTTGACGCT 
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This encodes a protein having amino acid sequence <SEQ ID 624>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 

orf 1 4 4a . pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPOAAASMTrTTLLALVPVLTVMVAVASIF 

1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 > i < 1 1 1 

or f 1 4 4 - 1 MT FLQRLQGLADNKI CAFAWFWRRFDEERVPQAAASMT FTT L LAL V PVLTVMVA V AS I F 

orfl44a pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIKAFREQANRLTAIGSVMLWTSXMLIRTID 
I I I I |( I I I I I I I I I ! II I I I I M I I I I I I I I I I I I I I I I f i M I I I 1! I I I I I I I I 1 I 
orf 144-1 pvFDRWSDSFVSFVNOTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

O r f 1 4 4 a pep NT FNRI WRVNSQRPWMMQFLVYWALLT FG PLSLGVG I S FXVG S VQDAALASGAPQW SGAL 
I | | | | | | | I I I I I I I I I I I i I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 144-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAA1ASGAPQWSGAL 

O r f 1 4 4 a pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 
J | I | | | : | | | | | I I I I I I I 1 I I I I II I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 4 4 - 1 RTAATLT FMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

orf 1 4 4a pep I YGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 

iiitiiiimiiiiiiiiimmiHHimiimii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 144-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 
orf 14 4a pep DAAQKEGXALPVQEFRRHINMGYDEUSELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 i l i 1 1 1 1 1 1 1 1 1 1 1 

orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

or f 144a pep FKLFVYRPLPVERDHVKQAVDAVMMPCLQTLNMTLAEFDAQAKXQQOS 408 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 1 1 I I 1 1 I I : I 
orf 144-1 FKLFVYR P LP VERDHVN QAV DAVMT PC LQT LNMT LAE FDAQAKKRQ 406 
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Homology with a predicted QRF from N gonorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
N. gonorrhoeae: 

or f 1 4 4 . pep MT FLQRLQGLADNKI CAFAW FWRR FDEERVPQXAASMT FTT LLALV PVLTVMVAVAS IF 60 

Hill II II I II II II II I : I I 1:1 I till I I I I I I I I I I I I I I I M I M I I I I I I 
orfl4 4ng MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 60 

orf 144 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLVVTSLMLIRTID 120 

I II I I \ I I I I I I I I I I M I I I I I I I I I M : I I 1 : I I | | | M I I I 1 I I I I I I I II I I I I I 
orf 14 4ng PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 120 

orf 14 4. pep NTFNRIWRVXXQRPWM 136 
IMiliHI :IHII 

orfl44ng NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 



1 MTFLQCWQGS ADNKI CAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 



1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 G CAT AT CAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng-l>: 



1 MTFLQRWQGI ADNKI CAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
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35i GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS* 

ORF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

orf!44na-l pep MTFWRWQGIADNKICAFAWmRRFSEEW 
5 orf!44-l J^I^IJ^ii^F^^ 

orfl4 4na-l pep PVFDRWSDSEVSFVNQTIVPQGADW^ 

orfl44ng l.pep , . , ■ . . ■ ■ ■ ■ ■ • , . .7, . . ■ |T|| I I I I I I : I I I : I I I I I I I I I I I M I I I I I I I 1 1 I 1 1 
10 PVFDRWS DSFVS FVNQT I VPQGADMVFDYINAFREQANRLTAIGS VMLVVT SLMLIRT ID 

Otf 144ng-l -pep NAmiWRWTQRPWMQFLVYWAU-TFGPLSLG^^^ 

orfl4«-l ntFNRITOVNSQRPWMMQ^W 

15 or*144ng-l.pep KTAARUVFmLLLVTCLYRFVPNRFVPARQAFVG^ 

or.mng i.pep ( ( ( ( mu ,,|, I I I I I I I I I I I I I I 

orfl44-l rtAATLTFOTLlUlYRF^ 
90 orfl44na-l Pep IYGAFAAVPFFLLWLNLLWTLV1/3GAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

20 orfl44ng 1 .pep IY5A ,,,,,,,,,,,,,,,, | | | | | | | | | I | I I I I I I I I I I I I I I I I I I I I 

IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 



25 



orfl44-l 
orfl44ng-l.pep 
orfl44-l 

orfl44na-l pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLKMTLREFDAQAKKQQQS 

ort 9 ^ P HI! I H III I I II I I III III I I III I I H I I «• » • l» I 

FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 



DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 

llllll I:: I III 1 1 1 1 ■ I 1 1 1 1 I ■ I I 1 1 s I I 1 1 1 I I I I I I I I I 1 1 I 1 1 1 1 si J 

DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNi.L 



30 otfl44-l 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

35 Example 75 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 629>: 

i AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 "agccctcgcc gaacacctcc ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

40 151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 

1 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 
51 TRRKWLDAHE RQHLRQSLLE TREHG* 

45 Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

,« , 01 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

111 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

401 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

55 \l\ CTCATGCGCG CCAT6AACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 



BNSOCCtD <WO 89e4578A2J_> 



WO 99/24578 



-352- 



PCT/IB98/01665 



501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 



1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAM NVLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 KHARKIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A oiN. 
meningitidis: 



10 20 30 

orfl4 6.pep RHARRIR I DTAINPELE1ALAEHLH YQWQGF 

I 1 I 1 I ] I I I I I I I I I t I I I 1 I I 1 I I I I 1 I I 
0rfl4 6a KLNGSEIRLLDRHFTLLQTDLQQTVALIKGRHARRIRIDTAINPELEALAEHLHYQWQGF 
280 290 300 310 320 330 



40 50 60 70 

orf 1 4 6 . pep LWLSTDMRQEI SALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

I I I I I : I 1 1 I I M I M I I I M I i 1 1 1 I I I I I 1 1 I 1 1 I 1 I I I I I 1 : 
orf 146a LWLSTNMRQE I SALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 

340 350 360 370 

The complete length ORF146a nucleotide sequence <SEQ ID 633> is: 



1 ATGAACACCT CGCAACGCAA 

51 CGAACGCTAC CGCTACCGCC 

101 CCGTCCTGTT CGCCACCGCC 

151 GAGTGGATAG GGATGACCGT 

201 AGGGGCGATT TACTCCAAGG 

251 GGCTGGGCGC GGGTTTGGGC 

301 GGCAACCTCC TCTTCTACCT 

351 CTGGGCGGCG GTCGGCAAAA 

401 CGATGTGCAT GCTCATCGGC 

451 CTGATGCGCG CGATGAACGT 

501 CGCCAAACTG CTGCCGCTGA 

551 CCGACAACCT GACCGACTGC 

601 AGGCGCATGA CCCGCGAACG 

651 AATCAACGCA CGCATGGTCA 

701 GCGAAAGCCG CATCAGCCCC 

751 CGTAAAATTG TCAACACCAC 

801 GCAATCTCCC AAACTCAACG 

851 TCACACTGCT CCAAACCGAC 

901 AGACACGCCC GCCGCATCCG 

951 AGCCCTCGCC GAACACCTCC 

1001 GCACCAATAT GCGTCAGGAA 

1051 ACCCGCCGCA AATGGCTGGA 

1101 CCTGCTTGAA ACACGGGAAC 



CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 
GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 
TCCGCCCGGC TGCTCCACCT CCAACACGGC 
CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 
CGGTGGAACG TATGCTCGGC ACGGTCATCG 
GTTTTATGGC TGAACCAGCA TTATTTCCAC 
CACCGTCGGC ACGGCAAGCG CACTGGCCGG 
ACGGCTACGT CCCTATGCTG GCGGGGCTGA 
GACAACGGCA GCGAATGGTT CGACAGCGGC 
CCTCATCGGC GCGGCCATCG CCATCGCCGC 
AATCCACACT GATGTGGCGT TTCATGCTTG 
AGCAAAATGA TTGCCGAAAT CAGCAACGGC 
CCTCGAAGAG AACATGGCGA AAATGCGCCA 
AAAGCCGCAG CCACCTCGCC GCCACATCGG 
GCCATGATGG AAGCCATGCA GCACGCCCAC 
CGAGCTGCTC CTGACCACCG CCGCCAAGCT 
GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
CTGCAACAAA CCGTCGCCCT TATCAACGGC 
CATCGACACC GCCATCAACC CCGAACTGGA 
ACTACCAATG GCAGGGCTTC CTCTGGCTCA 
ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
TGCCCACGAA CGCCAACACC TGCGCCAAAG 
ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 



1 

51 
101 
151 
201 
251 
301 
351 



MNTSQRNR^V 
EWIGMTVFW 



GNLLFYLTVG 
LMRAM NVLIG 



SRWLNSYERY 
LGMLQFpGA I 



TASALAGWAA 
AAIAIAAAKL 



RRMTRERLEE 
RKIVNTTELL 
RHARRIRIDT 
TRRKWLDAHE 



NMAKMRQINA 
LTTAAKLQSP 
AINPELEALA 
RQHLRQSLLE 



RYRRLIHAVR 
YSKAVERMLG 
VGKNGYVPML 
LPLKSTLMWR 
RMVKSRSHLA 
KLNGSEIRLL 
EHLHYQWQGF 
TREHS* 



LGGAVLFATA 
TVIGLGAGLG 



SARLLHLQHG 
VLWLNQHYFH 



AGLTMCMLIG 
FMLADNLTDC 
ATSGESRISP 
DRHFTLLQTD 
LWLSTNMRQE 



DNGSEWFDSG 
SKMIAEISNG 
AMMEAMQHAH 
LQQTVALING 
ISALVILLQR 



10 ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



15 



20 



25 



30 



35 



or f 1 4 6a . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
I 1 1 1 I I 1 I I 1 I I I M I I i I I I I I I i 1 ! I 1 I 1 I I I I t ! I I t I I I J M 1 I I I 1 I t I I I I I I I 
orf 14 6-1 MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

or f 1 4 6a . pep LGMLQFQGAI YSKAVERMLGTVI GLGAGLGVLWLNQH Y FHGNLLFYLTVGT AS ALAGWAA 
I I || 11 I I I I I I 1 I II I I ! I I I I I I I I I M I II I I I I I I I I I I I I M I I I II I I I I I I I 1 
orf 146-1 LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

orfl4 6a pep VGKNGYVPMLAGLTM(^LIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

III t I I I t 1 I I t I I t 1 t I 1 1 r I 1 I i i I I I I I I 1 I I I I I f t 1 I I I I 1 I I t I 1 t 1 I 

orfl4 6-l VGKNGYVPMIJVGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTIJ^WR 

O^f 146a . pep FMIADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
| | | | | | ! : I I I I I 1 I I I I I I I I I I I I II I I I I I II I I I I I i I II I I I I I I II II I I I I I i 
orf 146-1 FMIADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

or f 1 4 6a . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
I | | | | M M I I I I II I I I I II I I I I II I I I I I I II I I I I I I I I I I M I I I M I M I I I I I 
orf 14 6-1 AMMEAMQHAKRJCIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

erf 146a. pep R HARR IRIDTAINPE L EALAEH L H YQWQG FLWL STNMR QE I S ALV I LLQR T RRKW L DAH E 
I I I 1 I I I I [ I 1 I I I 1 I I ! I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I i 1 I M I I I H I I 
orf 146-1 RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 146a. pep RQKLRQSLLETREHSX 

I I II II I II ! I I I I : 
orf 14 6-1 RQHLRQSLLETREHGX 



Homology with a predicted ORF from N. gonorrhoeae 
40 ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 



45 



N. gonorrhoeae: 

orf 14 6. pep 
orf 14 6ng 
orf 14 6. pep 



RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mum 

KLNGSEIRLLDRHFTLLQTDLQOTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 

LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 75 
I I I I I : M M M I I II IIIMIMIIMMIIIMMMII 
LWLSTNMRQE I S ALVI PLQRTRRKWLDAHERQHLRQS LLETREHG 409 



50 



55 



orfl4 6ng 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
acid sequence <SEQ ED 636>: 



1 


MSGVRFPSPA 


51 


YERYRHRRLI 


101 


QGAIYSNAVE 


151 


GWAAVGKNGY 


201 


AAKLLPLKST 


251 


QINARMVKSR 


301 


LQSPKLNGSE 


351 


EALAEHLHYQ 


401 


S LLETREHG* 



PIPSTDPPSG 
HAVRLGGTVL 
RMLGTVIGLG 



SLCFFTFPLQ 
FATALARLLH 
AGLGVLWLNQ 



TASDWNSSQR 
LQHGEW IGMT 



KRLSGRWLNS 
VFWLGMLQF 



VPMLAGLTMC 
LMWRFMLADN 
SHLAATSGES 
IRLLDRHFTL 
WQGFLWLSTN 



MLIGDNGSEW 
LADCSKMIAE 
RISPSMMEAM 
LQTDLQQTAA 
MRQEISALVI 



HYFHGNLLFY 
LDSGLMRAMN 
I SNGRRMTRE 
QHAHRKIVNT 
LINGRHARRI 
PLQRTRRKWL 



LTIGTASALA 
VLIGAAIAIA 
RLEQNMVKMR 
TELLLTTAAK 
RIDTAINPEL 
DAHEROHLRQ 



60 Further work revealed the following gonococcal DNA sequence <SEQ ID 637>: 
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1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 

1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFW LGMLQFQGAI YSNAVE RMLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 

or f 14 6- 1 . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
||:|||:|| ^ 1 t 1 1 I I 1 1 1 I r I I | 1 1 I t I I I t : I I I I 1 I I I I I II I t I I I H I I M I I 
orfl4 6ng-l MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFW 

orf 14 6-1 . pep IXJMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWIJ^QHYFHGNLLFYLTVGTASALAGWAA 
IMIH IMIII:||IIIII I Ml II Mill IMIIIIIM Mill H:MM f I 1 I I I I 
orf!46ng-l LGMLQFQGAI YSNAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGT AS ALAGWAA 

orf 14 6-1 pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
lllllllllllllllllMIMIIIllllMIIIIIMIIIIIIIIIIIIIIinillM 
orfl46ng-l VGKNGYVPMIAGLTMCMLIGDNGSEWLDSGlilRAM^LIGAAIAIAAAKLLPLKSTUWR 

or f 1 4 6- 1 pep FMIADNIADCSKMIAEISNGRRMTRERl^ENMAKMRQINARMVKSRSHLAATSGESRI SP 
H I I I I I I M 1 1 I 1 1 1 1 1 H I I I I I I 1 1 I : I I : I I 1 1 I I M I M I I I M I I I I I I I I I I I 
orfl46ng-l FMIJU)N1AIX:SKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 

orf 14 6-1 . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRiiFTLLQTDLQQTVAIiING 
:||MI IMMMIMIMIM Ml I IMMMMMM M MM MUIMMMM I I 
orf 14 6ng-l SMMEAMQHAHRKIVNTTELLLTTAAKUJSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 146-1. pep RHARRIRIDTAINPELEALAEHIJfYQWQGFLWLSTNMRQEISALVIIXQRTRRKWLDAHE 
| I | | I I I I I I 1 I I HIM i I I I I I I I I I 1 1 I ! M I II I I I 1 1 I I 1 I I I I I 1 1 I I I I I I I I 
orfl4 6ng-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 146-1 .pep RQHLRQSLLETREHGX 
M I I M I I I I 1 1 I I II 
orfl4 6ng-l RQHLRQSLLETREHGX 

Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp|P33011|YEEA ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi|1736674 IgnTl PID|dl016553 (D90838) ORF_ID:o348#20; siinilar to [SwissProt 
Accession Number P33011) [Escherichia coli] >gil 1736682 Ignl I PID|dl016560 (D90839) 
ORF_ID:o348l20; similar to (SwissProt Accession Number P330111 [Escherichia coli] 
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>qi 1 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional C-terminal residues [Escherichia coli] Length - 352 
Score «* 109 bits (271), Expect * 2e-23 

Identities - 89/347 (25%), Positives - 150/347 (42%) , Gaps » 21/347 (6%) 

Query 20 YRHWUjIHAVRLGGTVLFATALARLLH 7 9 

YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
Sbjct: 15 YRHYRIVHGTRVALAFLLTFLIIRLFTIPESTWPLVTMWIMGPISFWGNWPRAFERIG 74 

10 Query 80 GTVlGl^AGI^WI^QHYFHCaJLLFYLTIGTASALAGWAAVGKNGYVPMLAGLTMCMLI 139 
GTV+G GLLL L+ ALGW A+GK Y +L G+T+ +++ 

Sbjct: 75 GTVLGSILGLIALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIVV 131 

Query 140 GDNGSEWLDSGLMRAMNVLIGXXXXXXXXKLLPUCSTLMWRFt^^ 199 
15 G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ + 

Sbjct: 132 GS PTGE-I DTALWRSGDVI LG5LLAMLFTG I WPQRAFI HWRIQLAKSLTEYNRVYQSAFS 190 

Query 200 GRRMTRERIXQNMVKMRQINARMVKSRSHIJ^TSGESRISPSMMEAMQHAHRKIvlTOCXX 259 
+ R RLE K+ VK R +A S E+RI S+ E +Q +R +V 

20 Sbjct: 191 PNLLERPRLESHLQKLL TDAVKMRGLIAPASKETRIPKSIYEGIQTINRNLVCMLEL 247 



25 



Query 260 XXXXXXXXQSPK- — LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

Sbjct: 248 QINAYWATRPSHFVLLNAQKLR — DTQHMMQQILLSLVHALYEGNPQPVFANTEKLNDAV 305 

Query* 317 EALAEHL — HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

Sbjct: 306 EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 



On the basis of this analysis, including the identification of several transmembrane domains in the 
30 gonococcal protein, it is predicted that the proteins from N.meningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 76 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 639> 

1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 

35 51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 

101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC „ GCGGTGA 

251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 

40 301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

401 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 

451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 

501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 

45 551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 

601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 

651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 

701 CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 

50 1 . .AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIKKI LTAELPTKQA AELAAKITGE GKKALYD . . 

55 Further work revealed the complete nucleotide sequence <SEQ ID 641>: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

5 401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

10 651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

15 This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAJETDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

20 201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with hypothetical protein ORF286 of E.coli (accession number U 18997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 



25 



30 



35 



40 



0rfl47: 1 
0rf286: 



AE DTRVT AQLLSAYG I QGKLVS VREHNERQMADK I VGYLSDGMWAQVS DAGT PAVCD PG 60 
AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
4 3 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 



45 



50 



55 



60 



0rfl47: 61 AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

0rf286: 103 YHLVRTCREAGIRVVPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 162 

Orfl47: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSAIX3D 179 

++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 

0rf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 

Orf 147: 180 QSRGEMVLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

0rf286: 223 RRKGEMVLIV-EGHKAQEEDLPADAI*RTIJ^LQAELPLKKAAAIAAEIHGVKKNALY 278 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of N. meningitidis: 

10 20 30 

orfl47 pep AE DTRVT AQLLSAYG I QGKLVS VREHNERQ 

I I 1 1 I I I 1 1 1 1 I I 1 I I I 1 1 t I 1 t I I 1 I I 1 I 

orf75a tlywatpignladitlralavlqkadiicaedtrvtaqllsaygiqgklvsvrehnerq 

20 30 40 50 60 70 

40 50 60 70 80 90 

orf 147 .pep MADKTVGYLS DGMW AOV S DAGT P AV C D PG AKLARRVREAG F KW P WGAXAVMAAL S V A 
1 I I I I I I I I I I I I I I I I I 1 I I t I t I I I 1 I 1 I t I I 1 I 1 1 t = 1 I I i I i I I I I I I I I 1 I 1 1 I 
orf 75a M&nKTVGYLSDGMWAOVSDAGTPAVCDPGAKLARRVREVGF KWPWGASAVMAALSVA 
80 90 100 110 120 130 

100 110 120 . 130 140 150 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 

|| | | | | I | II t I I I I I I I I I II I I 11 I I : I I I : I I i i I > I I II i : M M It I 

orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 

160 170 180 190 200 210 

orf 147. pep U^ITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
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, ,. in mi mi, n null 11:111:1111 Mill || || | I! II IIIMMM MM 
200 210 220 230 240 2SQ 

220 230 
orfl47 oep LTAELPTKQAAELAAKITGEGKKALYD 

P P MMMMMMMMMIMMMM 

LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
260 270 280 290 



orf75a 



ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology wit* a predicted Q RF from N Gonorrhoeae 

ORF147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 
gonorrhoeae: 



85 



AE DTRVTAQLLS AYG I QGKLV S VREHNERQ 30 
orfl47.pep | mill || I I II I I II 1:1111 III I II I 
orfl47ng TLYWATPIGNLADITLRA1AVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 

otf 147 .pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKIJ^^ 9° 

""47" ^FlIoGL^^ - 

gvaesdfyfJgfvppksgerrklf^ 205 



orfl41ng 



210 
265 



„.fi n npn iareitktfetflsgtvgeiqtalsadgdqsrgemvlvlypaqdekheglsesaqnimki 
orfl47.pe P lARElTK.rextb^ w ( ( ( ( ( ( ( , , ( , , , , , , , , 

orfl47ng LMlEITKTETiTFLSGTVGEI^AIA^ 

or£147 pep LTAELPTKQAAELAAKITGEGKKALYD 237 
1:111111111 III MMII III Mil „ , A „ 

orfl47ng lAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 

An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 

acid sequence <SEQ ID 644>: 

1 MSVFOTAFFM fqkhlqkasd swggtlyw atpignladi tlralavlqk 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 
101 AOVSDAGTPA VCDPGAKLAR »w^" W PVVGASAVMA ALSVA GVAES 

ill SSgfvpp ksgerrklfa kwvraafpw MFETPHRIGA TLADMAELFP 
201 emumlmex TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 
2I1 khegSesaq namkilaael ptkqaaelaa kitgegkkal ydlalswknk 

301 * 

Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 gSSggc GGTATTGCAA AAGGCGGACA TCATTTGTGC cgaagacacg 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG gtaatcggtt 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG tttccgatgc gggtacgccg 

111 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 gttSaactc GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

51 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

H\ COGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

111 ATCAC6AAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 gacggcmtg GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

ill SotStcc GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

ll\ GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVRR&rcr ttV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

5 151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

sp|P4S528|YRAL_EC0LI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
10 (F286) 

>gi 1 606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli) Length « 286 
Score = 218 bits (550), Expect - 3e-56 
15 Identities = 128/284 (45%), Positives - 171/284 (60%), Gaps - 4/284 (1%) 

KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADI I CAEDTRVTAQLLS AYG I Q 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ— GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVS VREHNERQMADKV I G FLSDGLWAQV S DAGT PAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
ARLFALHDHNEQQKAETLLAKLQEGONIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVVMFETPHRIGATL 183 
G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 



D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ 



EL A + +L AELP K+AA LAA+I G K ALY AL 
SEDLPADALRTLALLQAELPLKKAAALAAE I HGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Query: 
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120 
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Query: 


184 






Sbjct: 


180 




Query: 


243 
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Sbjct: 


239 



Example 77 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

45 201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt ATCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk.AA t ATCCC . GAT 

401 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

50 451 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA. 

651 GTTCATATCA TATTGCAAGT 

55 701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

B01 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 
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10 



15 



20 



25 



30 



35 



40 



45 



951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 

2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 

3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



ATGAACACAA 
TTTAATGTTT 
AGGTGGTGTC 
CCTTTATTGA 
CAAGGTGCTG 
AAATAACGAA 
CCGTTACTTG 
GGCAAAGGCA 



TTCTCTGCCT 
CTTTATCCGA 
AACAGTTATC 
CGAAGGAAAA 
GAG G ATT AT A 
ACTTGGCAAG 
GAAAGTAAAC 
CGCTG 



AATAGATTAA 
GACAGCAAGA 
GACCCAGACT 
GGCGAATTGA 
TTTCCAAGGA 
GCGCGGGCGT 
GGCGTGGCAA 



AAACACGAAC 
GAACCTGTTT 
GAATAATGGA 
TACTTACCAG 
GATTTTACGG 
TCATATCAGT 
ACGACCGCCT 



CGTTCAATTG 
ATCATGCTGC 
GAAAATATTT 
CAACATCAAT 
TCTCGCCTGA 
GAAGACAGTA 
GTCCAAAATC 



// 



TGACTGCTTC 
GATCACGCTC 
TAGTGCAAAT 
ACGGCAACCk 
ACATTAAACG 
CGACCACGCC 
CAAACGTAAG 
GCAGTATTCC 
CAagGATACG 
GarCGGAATT 
TCCGCCTATC 
TGCGCCGCGC 
CACCGCCAAC 
AAATTGAACG 
CCGCAGCGAC 
TGGCGGTCAA 
GTAGTGGAAG 
CCTGCAAAAC 



ATTGACTAAG 
ATTTAAATCT 
GGCGATACAC 
TAgCCtCGtG 
GCAACACATC 
GTACAAAACG 
CCATTCCGCA 
ATTTTGAAAG 
GCATTACACT 
AGGCAATTTA 
GCCACGATGC 
CGCCGTTCGC 
TTCGGTAGAA 
GTCAGGGAAC 
AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
GAACACGTCG 



CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGnCGCG 
GAGsmAAAwT 
CGCGCCGgtt 
crATTTCGTC 
CCCCCGGCCT 
TCATTCAAAC 
CTATACCGAT 
TATTGGCTCA 
GCCGAAATCA 
CCCGCAACTG 
GGTAA. . 



GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GGCGCGGGTT 
CCGCCGCCGC 
tCggCGgATt 
CAAAAAGCGG 
TGCATTCAAC 
CGGCGCAACA 
GCCGCTTCGG 
GGATTTCGGC 
AAGGTTTCAC 
GAAGCGCAAC 



ACCGACATCA 
CACAGGGCTT 
GTTATACAGT 
G.SAATGCCC 
GGCTTCgGGC 
GCAGTCTGAC 
CTCAACGGTA 
CAGCCGCTTT 
TAAAAGACAG 
AACCTTGACA 
GGCAGGGGCG 
GCCGTTCGCG 
TCCCGTTTCA 
ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCTG 
CAAACCGCTG 
ATGCAGGCGC 
II 

TTAGAC 

GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TTAGCAGCGG 
GTGCtGCATT 
CGGCATCGAA 
ATTACCGCTA 
CGcTACCGCG 
CATTTCCATC 
GCAAAGTCCG 
AAAACCCGCA 
GCTGTCCCTC 
ACAGCGCGGG 



GCGGCAATGT 
GCCACACTCA 
CAGCCACAAC 
AAGCAACATT 
AATGCTTCAT 
GCTTTCCGGC 
ATGTCTCCCT 
ACCGGACAAA 
CGAATGGACG 
ACGCCACCAT 
CAAACCGGCA 
CCGTTCCCTA 
ACACGCTGAC 
ATGTCGGAAC 
TTCCGAAGGC 
CAAGCCTCGA 
TCCGAAAACC 
GTGG 



. . . GATAAAG 
CGATCTTGCC 
ACGGCAATCT 
GCCACCCAAA 
TAATCAAGCC 
TTAATCTAAG 
AACGCTAAGG 
AGCCGATAAG 
TCAGCGGCGG 
CTGCCGTCAg 
TACaCTCAAT 
GTGCGACAGA 
TTATmCGTTA 
GGTAAACGGC 
TCTTCGGCTA 
ACTTACACCT 
ACAATTGACG 
TTAATTTCAC 



CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
CAGCCTTTcA 
ACGGCATTCA 
CCGCACATCG 
CGAAAACGTC 
CGGGCATTAa 
ACGCCTTATT 
AACACGCGTC 
GTGCGGAATG 
CACGCTGCCG 
CATCAAATTA 



CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
GACGGCATCG 
GGCACGAtAC 
GCGCAACGCg 
AATATCGCCA 
GGCAGATTAT 
TGAGCCTGTC 
AATACCGCCG 
GGgCGTAAAC 
CCGCCAAAGG 
GGCTACCGCT 



H J 3 J. ou i ftn . . . 

This corresponds to the amino acid sequence <SEQ ED 648; ORFl>: 



50 



55 



60 



65 



70 



i 

51 
101 
151 
201 
251 
301 
351 
401 

701 
751 
801 
851 
901 
951 
1001 

1151 
1201 
1251 
1301 
1351 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGVQYI 
TKGHPYGGDY 
GRQYWRSDED 
KWLINGVLQT 
YSFNDDNNGT 
GGVNSYRPRL 
NNETWQGAGV 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
HMPRLHKXVT 
EPNNRESSYH 
GNPYIGKSNG 
GKINAKHERN 
NNGENISFID 
HISEDSTVTW 



SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 
APRRRSRRSR 
RSDKLKLAES 
LQNEHVDAGA 



.... DKVTAS 
SHNATONGNX 
LSGNAKANVS 
EWTLPSGXEL 
RSLLXVTPPT 
SEGTYTLAVN 
W 



IRFXAAYLAI 
IEVYNKKGEL 
NVDFGAEGXN 
DAEPVEMTSY 

IAS 

FQLVRKDWFY 
SLPNRLKTRT 
EGKGELILTS 
KVNGVANDRL 
II 

LTKTD1SGNV 
SLVXNAQATF 
HSALKGNVSL 
GNLNLDNATI 
SVESRFNTLT 
NTGNEPASLE 



CLSFGILPQA 
VGKSMTKAPM 
IXDQXRXTYK 
MDGRKYIDQN 

GS 

DEIFAGDTHS 
VQLFNVSLSE 
NINQGAGGLY 
SKIGKGTL. . 



WAGHTYFGIN 
IDFSWSRNG 
IVKRNNYKAG 
NYPDRVRIGA 
PMFIYDAQKQ 
VFYEPRQNGK 
TAREPVYHAA 
FQGDFTVSPE 



DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 
VNGKLNGQGT 
QLTWEGKDN 



TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
FRFMSELFGY 
KPLSENLNFT 



// 



RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS 
RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF 
XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD 
PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG 



. LDRVFAEDR 
GRVGILFSHN 
SSGSLSDGIG 
YRYENVNIAT 
KVRTRVNTAV 



SNSOOCIO <WO 992457BA2_L> 



WO 99/24578 



-360- 



PCT/IB98/01665 



1401 IAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 
1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 
51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 
101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 
151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 
201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 
251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 
301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 
351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 
401 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AGCAGGGACT 
451 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 
501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 
551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 
601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 
651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 
701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 
751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 
801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 
851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 
901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 
951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG 
1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 
1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 
1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 
1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACGAA 
1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 
1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 
1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 
1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 
1401 GCACGTTCAA GCCAAAGGGG AAAACCAAGG CTCGATCAGC GTGGGCGACG 
1451 GTACAGTCAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 
1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 
1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 
1601 GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT TCAAAATACC 
1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 
1701 TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 
1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 
1801 ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 
1851 AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 
1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 
1951 TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 
2001 GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAGCGGAAA 
2051 ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 
2101 GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 
2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 
2201 TGACAAATTG TGTCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 
2251 TTGACTAAGA CCGACATCAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 
2301 TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 
2351 GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 
2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 
2451 CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 
2501 TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 
2551 CATTCCGCAC TCAACGGTAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 
2601 TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 
2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 
2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 
2751 CCACGATGCG GCAGGGGCGC AAACCGGCAG TGCGACAGAT GCGCCGCGCC 
2801 GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 
2851 TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATTGAACGG 
2901 TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 
2951 AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 
3001 AATACCGGCA ACGAACCTGC AAGCCTCGAA CAATTGACGG TAGTGGAAGG 
3051 AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 
3101 AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 
3151 GAGTTCCGCC TGCATAATCC GGTCAAAGAA CAAGAGCTTT CCGACAAACT 
3201 CGGCAAGGCA GAAGCCAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 
3251 TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 
3301 GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 
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3351 GGCGGAGGAA' GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

3401 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

3451 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

5 3551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

10 3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 

3951 CGGCATTCAG GCACGATACC GCGCCGGTTT CGGCGGATTC GGCATCGAAC 

4001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 

15 4051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 

4101 GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 

4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 

4201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 

4 251 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 

20 4 301 ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 

4 351 ATCAAATTAG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

25 101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 

30 351 LPNRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDE 

401 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

451 VNGVANDRLS KIGKGTLHVQ AKGENQGSIS VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

35 601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

751 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 

40 851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 

951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN 

1001 NTGNEPASLE QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 

45 HOI VAEPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 

1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYAN5 GLSEFSATLN SVFAVQDELD 

1201 RVFAEDRRNA VWTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 

1251 GILFSHNRTE NTFDDGIGNS ARLAHGAVFG QYGIDRFYIG ISAGAGFSSG 

1301 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 

50 1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

1401 TRVNTAVLAQ DFGKTRSAEW GVNAEIKGFT LSLHAAAAKG PQLEAQHSAG 

1451 IKLGYRW* 

Computer analysis of these sequences gave the following results: 

Homology with a predicted QRF from N. meningitidis (strain A) 
55 ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORFla) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTTDKRTTETHRKAPKTG RIRFXAAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 

I II III Nil IMHI ! III!! I I I I 1 I I I I III I I I I I I I I I I M I | Milt 

60 orf la MKTTDKRTTETHRKAPKTG RIRFS PAYLAI CLS FGI L PQAWAGHT YFG I N YQYYRD FAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPM I DFSVVSRNGVAALVGVQY I VSVAHNGGYN 
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MllllllllllMiMllllllMMMIIIIIIilMIMMM tllltlllMIII 
orfla KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDrswSRNGVAALVGDQYIVSVAHNGGYN 
70 80 90 100 H° 120 

130 140 I 50 160 170 180 

orfl oep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

lllll! I I : I : I M :: 111:11 lllllll Mlllllllll 

orfla NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 

190 200 210 

orfl . pep MDGRKYI DQNNYPDRVRIGAGRQYWRS DEDEP NN 

I I I I::: I I: I 1 1 II: I :: I I I I : t: II 
orfla MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 

220 230 240 250 260 

orfl pep RESSYH 1 A SGS PMFI YDAQKQKWLINGVLQTGNPY IGKSNGFQLVRK 

I : : : : || I I I I I I I I I :: I I I : I I I I I I I I I I : I I I I I : I I 

orfla SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 

270 280 290 300 310 320 

or'l.pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRUCTRTVQLFNV 

I M I I : I : 1111:1 : 1 1 I : II : : I I : : : I 1 1 1 1 = : : I : I I : I I : : I I : I I : 
orfla DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 

330 340 350 360 370 380 

orf 1 . pep S LSETAREPVYHAAGGVNS YRPRLNNGENI S FI DEGKGELILTSN INQGAGGLY FQGDFT 

I I : i I : I I I I I I II I I : I I I I I II I I I : I I I I 1 : 1 : 1 1 I :: I I I I I M I I I 1 : 1 1 1 I 
orfla SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 

360 370 380 390 400 410 

390 400 410 420 430 

orfl . pep VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

I I I I I I I I I I II I II I II I I I I I II II I I I I I I M I I I I I 1 I 
orfla VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
420 430 440 450 460 470 



45 



orfl . pep 
orfla 



VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
480 490 500 510 520 530 



50 



orf 1. pep 
orfla 



RIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKE IAYNGWFGEKDTTK 
540 550 560 570 580 590 



55 



orf 1 .pep 
orfla 



TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 
600 610 620 630 640 650 



60 



65 



70 



orfl. pep 
orfla 

orfl. pep 
orfla 

off 1. pep 



IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 

440 450 460 470 480 

XXXXXDKVT AS LTKTD I SGNVDLADHAHLNLTG LAT LNGNLS AN 

: 11:11111111111111 1:1 1:1 llllll 

TICTRSDVTTGLTNCVEXXITDDKVIASLTKTDXSC^CVXLXXXXXXXLXGXAXLXGNLSAN 
720 730 740 750 760 770 

490 500 510 520 530 540 

GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
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I 111 Mill || || t|| Ml I I II Ml 1111111:1 || | | Ml ||:: I :ll 1 1 1 II I 
or f 1 a GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNAS FNLSNNAAQNGSLTLSD 

780 790 800 810 820 830 

550 560 570 580 590 600 

orfl pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 

I I 1 I I I I f 1 I ] I 1 I | IIIMI:IIIIII:||:| | | | | || I I I II I I I I : II I I I 

orfla NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
840 850 860 870 880 890 

610 620 630 640 650 660 

orfl pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 
IIIIIIIIMMMIIMMM! II I I I I I I M I I I 1 1 1 1 I I I 

orfla NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFN7LTVNG 

900 910 920 930 940 950 

670 680 690 700 710 720 

orfl pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 
II) | t I I t I I I I I I I I I I 1 I I t ! I 1 1 i I I I I I I I t I I I 1 I I t : f 1 : I I I I I I I I I i t 1 1 
orfla KI^XQGTFRFMSELFGYRSDKLiaAESSEGTYTIAWNTGNEPVSLDQLTVVEGKDNKPL 
960 970 980 990 1000 1010 



730 740 750 

orfl . pep SENLNFTLQNEHVDAGAW 

25 I I I I I I I I I I I I 1 I I t I I 

orfla SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 
1020 1030 1040 1050 1060 1070 
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40 
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50 



55 



60 



65 



orfl. pep 
orfla 

orfl. pep 
orfla 

orfl. pep 
orfla 

orfl. pep 
orfla 

orfl. pep 
orfla 

orfl .pep 
orfla 

orfl. pep 
orfla 



LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 



760 



-LDR 



XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 

770 780 790 800 810 820 

VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
I | | | I | | I | I M I II I I I I I I ! I 1 I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I 
VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
1200 1210 1220 1230 1240 1250 

830 840 850 860 870 880 

T FDDG I GN S ARLAHGAV FGQ YG I DRFY I G I SAG AG FS S G S L S DG I GXKXRRRVLH YG I QA 
: I || I II I I II I I I I I I I I II I I II 1111:1111111 IIUII I I I I I M 1 1 1 I I 
XFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVLHYGIQA 
1260 1270 1280 1290 1300 1310 

890 900 910 920 930 940 

RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 

M M I I I I I I I I I : I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
RYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHX 
1320 1330 1340 1350 1360 1370 

950 960 970 990 990 1000 

SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 

Miti Miiiiiiiiiiiiiiiiiimmiiimiiiiimiiiii iimm 

SITPYXSLSYTDAASGKVRTRVKTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 

1010 1020 
QLEAQHSAGIKLGYRWX 

in Mini 1 1 1 ii 1 1 1 

QLEAQHSAGIKLGYRWX 
1440 1450 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
24S1 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 



ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 
AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 
TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 
TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 
GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 
CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 
GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAAC GG 
CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 
ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 
TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 
CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 
ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 
CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 
CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 
GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 
ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 
TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 
ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATTGG 
TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 
GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 
CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 
ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 
TTACGCGGCA GGGGGTGTTA ATCAGTACCG TCCAAGGTTA AACAACGGTG 
AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 
AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 
CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 
AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 
TCCAAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 
AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 
ACGATAAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 
AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 
CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 
CGTTCCACCG TATTCAAAAT ACCGATGAAG GGGCGATGAT TGNCNATCAT 
AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 
ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 
CCTACAACGG TTGGTTTGGC GAGAAAGATA CGACCAAAAC GAACGGGCGG 
CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 
CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 
TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 
TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 
CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 
AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 
AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 
AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 
NCATTACCGA CGATAAAGTG ATTGCTTCAT TGACTAAGAC NGACNTNAGC 
GGCANTGTNA GNCTNNCCNA TNACGNTHNT TNAAANCTCN CNGGGCNTGC 
NNCACTHAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 
GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 
GCAACATTTA ATCAAGCCAC ATTAAACGGC AACNCATCGG NTTCGGGCAA 
TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 
TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 
GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 
CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 
AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 
GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 
AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 
TATCCGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 
GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 
CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 
CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 
CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 
TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 
AACTCATCCG CAAAGACGGC GAGTTCCGCC TGCATAATCC GGTCAAAGAA 
CAAGAGCTTT CCGACAAACT CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 
AAAAGACAAC GCGCAAAGCC TTGACGCGCT GATTGCGGCC GGGCGCGATG 
CCGCCGAAAA GACAGAAAGC GTTGCCGAAC CGGCCCGGCH GGCAGGCGGG 
GAAAATGTCG GCATTATGCA GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 
GGATAAAGAC AGCGCHTTGG CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 
NTACCACCGC CTTCCCCCGC GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 
CCGCAGCCCC AACCGCAACC TCAACCCCAA CCGCAGCGCG ACCTGATNAG 
CCGTTATGCC AATAGCGGTT TGAGTGAATT TTCCGCCACG CTCAACAGCG 
TTTTCGCCGT ACAGGACGAA TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
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3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 

3701 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 

3751 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 

3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 

3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 

3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 

3951 CGGTTTCGGC GGATTCGGCA TCGAACCGTA CATCGGCGCA ACGCGCTATT 

4001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 

4051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG ATTATTCATT 

4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 

4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 

4201 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 

4251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 

4301 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ED 652>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 

151 SHPYNGDXHM PRLHKFVTDA EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 

201 HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 

251 MPIAGAAGDS GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 

301 FYDDIYRGDT HTVXFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ 

351 TVRLFDESLN ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN 

401 NINQGAGGLY FEGDFTVSPE NNETWQGAGV HISEDSTVTW KVNGVANDRL 

451 SKIGKGTLHV QAKGENQGSI SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 

501 RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 

551 NATTTSTVTI TGNESITQPS GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 

601 LNLVYQPAAE DRTXLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 

651 WSKMEGIPQG EIVWDNDWIX RTFKAENFHI QGGQAVISRN VAKVEGDXHL 

701 SNHAQAVFGV APHQSHTICT RSDWTGLTNC VEXXITDDKV IASLTKTDXS 

751 GXVXLXXXXX XXLXGXAXLX GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 

801 ATFNQATLNG NXSXSGNASF NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 

851 VSLADKAVFH FENSRFTGQL SGSKXTALHL KDSEWTLPSG TELGNLNLDN 

901 ATITUJSAYR HDAAGAQTGX VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 

951 VNGKLNXQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 

1001 QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE 

1051 QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAAEKTES VAEPARXAGG 

1101 ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPXTTAFPR ARXARRDLPQ 

1151 PQPQPQPQPQ PQRDLXSRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 

1201 NAVWTSXIRX TKHYRSQDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 

1251 TENXFDDGIG NSARLAHGAV FGQYGIGRFD IGISTGAGFS SGXLSDGIGG 

1301 KIRRRVLHYG IQARYRAGFG GFGIEPYIGA TRYFVQKADY RYENVNIATP 

1351 GLAFNRYRAG IKADYSFKPA QHXSITPYXS LSYTDAASGK VRTRVNTAVL 

1401 AQDFGKTRSA EWGVNAEIKG FTLSXHAAAA KGPQLEAQHS AGIKLGYRW* 

A transmembrane region is underlined. 



ORF1-1 shows 86.3% identity over a 1462aa overlap with ORFla: 

10 20 30 40 50 60 

orfla.pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
I I (I I I I I I I I I 1 I 1 I 1 1 II I I I ! I I I I I I I I I II I I II I I I I I II I I I 1 1 I I I I I I M I 
orfl-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFG I LPQAWAGHTYFG IN YQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f la . pep KGKFAVGAKD IEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQY I VSVAHNGGYN 

| | I i I I I I I 11 I I M 1 I I I I I I I I I I I I I I I I I 1 I 1 I I I I I M M I I I I I i I I I ! I I I II 
orfl-1 KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf la . pep NVDFGAEGXN PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTS DM 

| 1 1 I I I I I 11 I II I i : I : I II I I I I I : : 111:11 1 1 I I I 1 I I I I I I I I I I I I 1 I 
orfl-1 NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 
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or f la. pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 

orfla.pep 
orfl-1 
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220 230 
-WLIGGNTHMQGWGNN 

MM I: 



190 200 210 220 230 



200 



280 



290 



240 250 260 270 

GWSLSGD-\^DYGPMPIAGAAGDSGSPMFIYDKTNNKWLLN<^^GYPYSGRENG 

GTWLGSEKIKHS-PYGFLPTOTSFGDSGSPMF^ 

2S0 260 ? ™ ?R0 29U 



270 



280 



340 



300 310 320 330 

FQLIRKDWFfDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP- 

300 310 320 330 340 350 



330 



340 



350 
-KLKVQT 



400 



410 



360 



360 370 380 390 

VRLFDESLNETDKEFVY-AAGGVNQTOPRl^NGENLSFIDY 

VQLIWSLSETA^PVYHAAGGW 

370 380 390 400 410 



460 



470 



420 430 440 450 

FEGDFTVSPEIWETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQ^GEN 

I • I II I I It 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 I M M I H M 1 1 1 1 1 1 1 1 I 1 1 1 1 1 M N I Jin 

4 20 430 440 450 460 470 

480 490 500 510 520 530 

SVGDGTVILDQQADDKGKKQAFSEIGI^SG 

,imi 1 1 Mil 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 m ill l m M ii M m M in 

490 500 510 520 530 



480 



580 



590 



540 



540 550 560 570 

DEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 

i I I I I I || | | | | | | ] | :: | : : 1 : I 1 I : : M I I 1 I I I I I 

HSLSF^IQNTDEGAMIVNHNQDKEST^ 

560 570 580 590 



HSLSFHRIQNT! 

Mimmimini 



550 



640 



650 



600 610 620 630 

EKDTTKTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRWP^YNHI^SG 

mimtmimmmii u n n m n i u 1 1 1 m 1 1 1 1 n 1 1 1 in .n^s 

EKDTCKTOGRI^ 

620 630 640 650 



600 



610 



700 



710 



660 670 680 690 

WSKMEGI PQGEIVWDNDWIXRTFKAEN FHI QGGQAVI SRNVAKVEGDXHLSNI^QAVFGV 
ii. | i i i . i i i i i i 1 1 | I | I I | | | M : | : | | I I I : I 1 1 I I I I : I I 1 I I I I I I I I I I I 
W SQKEGI PRGEI WDNDWINRT FKAEN FQI KGGQAWSRNVAKVKG DWHL SNHAQAVFGV 
660 670 680 690 



700 



710 



760 



770 



720 730 740 750 

APHQSHTICTRSDWTGLTNC^XXITDDKVIASLTKTDXSGXVXUOCXXXXX^^UC 

mimiiiiimiium .iiiiiinmiii m i i 1:1 1:1 
aphqshtictr: 



SDWTGLTNCVEKTITDDKVIASLTKTDISGOTDIADHAHLNLTGIATLN 



720 



730 



740 



750 



760 



770 



820 



830 



780 790 800 810 

GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXS^ 

intiiiiiiiiMiiiimmimmmiiiiMi:) mi mm: mm 

800 810 820 830 



780 



790 



880 



890 



840 850 860 870 

SLTI^DNAKANVSHSAI^GNVS^ 

11 I til I I I 111 I I I I I I I I m I I I I III I - I I I H I : 1 1 : 1 m Ml I III I I M I 
840 850 860 870 880 890 
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900 910 920 930 940 

orfla.pep telgnlnldnatitlnsayrhdaagaqtgxvsdtprrRsrrs— llsvtpptsvesrfn 
1 1 I 1 I | 1 1 1 I I I I 1 1 I 1 I I I I I i I I I 1 I I r : 1 : 1 I I 1 1 1 1 1 I I I I I I I I I I H I I I 

orfl-1 TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 
900 910 920 930 940 950 

950 960 970 980 990 1000 

or f la . pep TLTVNGKLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTVVEG 

lit I! II I! f 1 I 1 I t I M f 11 I 1 I I I I 1 I I 1 I I I K M M 1 1 1 M M I I : II : I UJ I M 
orfl-1 TLTVNGKLNGQGT FRFMSELFG YRS DKLKLAE S SEGT YTLAVNNTGNE PAS LE QLTWEG 

960 9*70 980 990 1000 1010 

1010 1020 1030 1040 1050 1060 

orfla.pep KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I 1 I I I I I I I I I I i I 
orfl-1 KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 
1020 1030 1040 1050 1060 1070 

1070 1080 1090 1100 1110 1120 

orfla.pep KDNAQS LDALI AAGRDAAEKTE SVAEPARXAGGENVG IMQAEEEKKRVQADKDS ALAKQR 

II I I I I I \ I N I I I I I I : I I I II I I I I I I ■ I I I I i 1 I I t I I I I I I I I | I I I I I : I I I 1 I I 
orfl-1 KDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQADKDT ALAKQR 

1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

orf la . pep EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 
lliill I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl-1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP — QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orf la . pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
I | I | I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I t I I I I II I II I I I t I M I I II 
orfl-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 



40 



1250 1260 1270 1280 1290 1300 

orfla.pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
I | I | II : I I I I I I I I I I I I I I i I I I I I I I II 1111:1111111 I I I I I I I I I I I I I I 
orfl-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 



45 



1310 1320 1330 1340 1350 1360 

orfla.pep HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
I I I I I I I I I I I I I I II I I I : I I I II I I I I I I I I I I I I I I I I I I i I I I I I I M I I M I I I I 
orfl-1 H YG I QAR YRAG FGG FG I E PH I GAT R Y FVQKAD YR YEN VN I AT PG LAFN R YRAG I KAD Y S F 

1320 1330 1340 1350 1360 1370 



50 



55 



60 



1370 1380 1390 1400 1410 1420 

orf la . pep KPAQHXSITPYXSLS YTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 

Mill lllll M I I 1 1 I i M ! 1 M I I 1 M I I 1 M I I I I I M I I I I I I M i I I I I I It 
orfl-1 KPAQH I S IT PYLS LS YT DAASGKVRT R VNTAVLAQD FGKTRSAE WGVNAE I KGFT LS LHA 

1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orf la . pep AAAKG PQLEAQHS AG I KLG YRWX 

I I I I I Mi I I I I 111 I I I I I I I I 
orfl-1 AAAKG PQLEAQHSAG IKLG YRWX 

1440 1450 

Homology with adhesion and penetration protein hap precursor of Kinfluenzae (accession number P45387) 
Amino acids 23-423 of ORF1 show 59% aa identity with hap protein in 450aa overlap: 



65 



orfl 23 FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 

F +L C+S GI QAWAGHT Y FGI + YQYYRDFAENKGKF VGAK+IEVYNK+G+LVG 
hap 6 FRLN FLTACVSLGI ASQAWAGHTY FG I DYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 

orfl 83 KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 

SMTKAPMI DFSWSRNGVAALVG QY I VSVAHNGGYN+VDFGAEG N DQ R TY+IV 
hap 66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 
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orfl 143 KRNNYKAGTKGHPYGGDYHMPRUlKXVTDAEPVEWSYMDGIUaiDQNNyPDRVRIGAGR 202 

KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 
hap 125 KRNNYQAWERKHPYDGDYHMPRl^KmEAEPVGMTTNMDGKVYADRENYPERVRIGSGR 184 

orfl 203 QYWRSDEDEPNNRESSYHIA 222 

QYWR+D+DE N SSY+++ 
hap 185 QYWRTDKDEETNVHSSYYVSGAYRYLTAGNTHTQSGNGNGTVNLSGNWSPNHYGPLPTG 24 4 

10 orfl 223 SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRKDWFYDEIFAGDTHSVF 27*7 

SGS PMFI YDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 
hap 24 5 GSKGDSGSPMFIYDAKKKQWLINAVLQTGHPFFGRGNGFQLIREEWFYNEVIAVDTPSVF 304 

orfl 278 --YEPRQNGKYSFNDDNNGTGKIN-AKHEHNSLPNRLKTRTVQLFNVSLSETAREPVYHA 334 
15 y p N6 YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 

hap 305 QRYIPPINGHYSFVSKNDGTGKLTLTRPSKDGSKAKSEVGTVKLFNPSLNQTAKEHV-KA 363 

orfl 335 AGGVNS YRPRLNNGENI S FI DEGKGELILTSN INQGAGGLYFQGDFTV- S PENNETWQGA 393 
A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 
20 hap 364 AAG YN I YQPRMEYGKN I YLGDQGKGTLT IENN INQGAGGL Y FEGN FWKGKQNNITWQGA 423 

orfl 394 GVHISEDSTVTWKVNGVANDRLSKIGKGTL 423 

GV I +D+TV WKV+ NDRLSKIG GTL 
hap 424 GVSIGQDATVEWKVHNPENDRLSKIGIGTL 453 

25 Amino acids 715-1011 ofORFl show 50% aa identity with hap protein in 258aa overlap: 

Orfl 41 DTRYTVSHNATQ-NGNXSLVXNAQATFNQ-ATLNGNTSASGNASFNLSDHAVQNGSLTLS 98 

DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 
hap 733 DTKVINSIPITQINGSINLTNNATVNIHGLAKLNGNVTLIDHSQFTLSNNATQTGNIKLS 792 

30 orfl 99 GN AKANV SHSALNGN VS LADKAVFHFE S SR FTGQI SGGKDT ALHLKDSEWTLPSGXELGN 158 

+A A V+++ LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 
hap 7 93 NHANATVNNATLNGNVHLTDSAQFSLKNSHFWHQIQGDKDTTVTLENATVmiPSDTTLQN 852 

orfl 159 LN LDNAT I TLNSAYRHDAAGAQTGS ATDAPXXXXXXXXXXLLXVT PPT SVESRFNTLTVN 218 
35 L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

hap 853 LTLNNSTVTLNSAY S AS SNN APRHRRS LETETTPTSAEHRFNTLTVN 899 

orfl 219 GKLNGQGTFRFMSELFGYR^DKLKIJ^SSEGTYTLAVNNTGNEPASLEQLTVVEGKDNKP 278 
GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 
40 hap 900 GKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYTLSVRNTGKEPVTLEOLTLIESLDNKP 959 

orfl 27 9 LS EN LN FTLQNEHVDAGA 296 

LS+ L FTL+N+HVDAGA 
hap 960 LSDKLKFTLENDHVDAGA 977 

45 Amino acids 1 1 92-1 450 of ORF1 show 41 % aa identity with hap protein in 259aa overlap: 

Orfl 1 LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 

LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 
hap 1135 LDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQKTNLRQIGVQKALANGRIGAVFSHSR 1194 

50 orfl 61 TENTFDDGIGNSARLAHGAVFGQYGIDRFYXXXXXXXXXXXXXXXXXIGXKXRRRVLHYG 120 

++NTFD+ + NAL+FQY KR+ ++YG 

hap 1195 SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 1254 

orfl 121 IQARYRAG FGGFGIE PHIGATRYFVQKADYRYENVN I AT PGLAFNRYRAG IKADYS FKPA 180 
55 + A Y+ G GI+P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 

hap 1255 VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 1314 

orfl 181 QHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAA 240 
+IS+ PY ++Y D +4 V+T VN VL Q FG+ E G+ AEI F +S + + 
60 hap 1315 DNISVKPYFFVNYVDVSKANVCyiTVNLTVLQQPFGRYWQKEVGLKAEILHFQISAFISKS 1374 
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orfl 241 KGPQLEAQHSAGIKLGYRW 259 

+G OL Q + G+KLGYRW 
hap 1375 QGSQLGKQQNVGVKLGYRW 1393 
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Homolopv with a predicted QRF froir N ggngrriioege 

The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N.gonorrhoeae: 
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orfl.pep 

orf lng 

orfl.pep 

orf ing 

orf 1 .pep 

orf lng 

orfl.pep 

orf lng 

orf 1 .pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 

orfl.pep 

orflng 



MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 

II I I I t I I 1 I 1 1 1 1 I I I I 1 I i 1 I II tlllllll I I I i 1 I I M I I I I I I I N I 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 



FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 
I : I : I I I 1 1 : 1 I I I I I I I I I II 1: I I I I I I I I I I I I I I I I I I I I 1 1 
FEGN FTVS PKNNETWQG AGVH I S DGST VTWKVNG VAN DRLS KI GKGTL LVQAKGENQGS V 

// 

DKVTASLTKTDISGNVDLADHAHLNLTGLA 
III 111:111: I I I r | | | | t I I I 1 I I I I 
FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 



60 
60 



L20 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
I I I I I I I I I I ! I I I I M I I I I M I I I M I I I i I I | I I 1 I | ! I I I : I Mlllllitllll 
KGKFAVGAKDIEVYNKKG£LVGKS^^TKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 
I M II II I I II I : I : I I I 11 I I i I M : I I i t I I I I I ! I M I I II I I I I I I I I I 1 
NVDFGAEG SN - PDQHR FS YQI VKRNN YKAGTNGK PYGGDYHM PRLHKFVT DAE PVEMT S Y 179 

M DGRK Y I DQN N Y P DRVR I GAG RQYWR S DE DE PNN RESSYHIAS 223 

Mill! I * | I 1 | f I I | I 1 I 1 I 1 I I I I I I I I I 1 I t I I 1 ! I t 

MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 239 

GSPMFIYDA QKQKWLIN GVLOTGNPYIGKSNG 255 

I II 1 I I 1 I I I I I I I I I I i I i I I I I I ! 1 1 I I I I 
GGTVNLGSEKIKHSPY G FLPTGGS FGDSG S PMFI YDA QKQKWLIN GVLOTGN PYIGKSNG 2 89 

FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 
1 | J I I I I I I I I I I I I II 1 I I I I I I I : I II I I lll:ill:ill:lll:| III llllll 
FQLVRKDWFYDEirAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 359 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 375 
M I I I I I I I I I I I I I I I M I M I I I I i I i I 11 I i M I I I I: 1 I I I I I I I I I I I I I I ! I I I 
VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINOGAGGLY 



422 



479 



744 



774 



803 



TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 
|:|||| ::::ll : I M I II I III 1 I I I I I 1 1 I I II I I I I 11 1111111::! 
TFNGNL-VQAETRTIRLRANATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNA 833 

VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 863 
Mil II II I 1 I I I II M I I I I I III I M I M I I 11: I M I t : I I I I I I I It I II 11 11 I 
VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 893 

LPSGXE LGNLNLDNAT I T LN SAYRKDAAGAQTG SATDAPRRRSRRSRRSLLXVT PPT SVE 923 
I I I I : I I I II I I I I I I I I I I II I II I I I I I II I I I : I I I I I I I I I I II I I I I 1 I : I 
LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 
t I 1 1 I I I 1 I I 1 I I I t I I I I I 1 I I t I I I I I I I I I M I I I I I I I I t I I I I I I I I : i I 1 I I I 
SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYT1AVNNTGNEPVSLEQLT 1010 

WEGKDNKPLSENLNFT LQNEHVDAGAW 1011 

NiMH 1 1 1 1 1 1 1 1 1 1 Milium 

WEGKDNTPLSENLN FTLQNEHVDAGAWRYQLI RKDGEFRLHNPVKEQELS DKLGKAGET 1070 

// 

LDRVTAEDRRNAVWTSGIRDTKHYRSQDFR 1211 
IMIMIIMMIIMM M III llillll 
PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 

1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 in mm inn iii tun mm i Mini n 

AYRQQT DLRQIGMQKN LG SGRVG I LFSHNRTGNT FDDG I GN S ARLAHGAVFGQYG IGRFD 1299 
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IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYEVQKADY 1331 

1 I I I I 1 1 I I 1 I I I 1 I 1 I I I I I I I I I t I I 1 I I I I 1 I M 1 I I I I 1 I I I I I I I I f I I I t I 
IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGEX3IEPHIGATRYFVQKADY 



1359 



RYENVNIATPGIAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1391 
ItltllllllllllllllMIIIIIIIIMIIillllMlMllltlllMIIMIIItl 
RYENVNIATPGIAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVOTAVL 1419 

AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1440 
I I I I 1 ! t I I I t I I 1 1 I t I I I I I I 1 I I I I I I I I t I I I I f I I I i i 1 1 1 1 I I 
AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1468 



The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 
51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 
101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 
151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 
201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 
251 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 
301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 
351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 
401 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 
4 51 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 
501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 
551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 
601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 
651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 
701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 
751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 
801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 
851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 
901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 
951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 
1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 
1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 
1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 
1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 
1201 GGAAAAGGTG AATTGATACT TACCAGCAAC ATCAACCAAG GCGCGGGCGG 
1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 
1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 
1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 
1401 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 
1451 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 
1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 
1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 
1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 
1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 
1701 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 
1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 
1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 
1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 
1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 
1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 
2001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 
2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 
2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 
2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 
2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 
2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 
2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 
2351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 
2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 
2451 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 
2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 
2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 
2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 
2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 
2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 
2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 
2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 
70 2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 
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2901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 

2951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 

3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 

3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAC gaacacgtcg 

3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 

3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 

3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 

3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 

3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgtrgccg aaccgGCCCG 

3351 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 

3401 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 

3451 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 

3501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 

3551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 

3601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 

3651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 

3701 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 

3751 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 

3601 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 

3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 

3901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 

3951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4 001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 

4 051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 

4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 

4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 

4 201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 

4 251 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 

4 301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 

4 351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 

4 4 01 CTGGTAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPY GFLPT GG5FGDSGSP MFIYDAQ KQK WLIN GVLOTG NPYIGKSNGF 

301 QLVRKDV?FYD EIFAGDTHSV FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

351 LPYRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVSPKN NETWQGAGVH ISDGSTVTWK 

451 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLDSKKEI AYNGWFGEKD 

601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAWSRNVAK 

701 VEGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTSCTEK TITDDKVIAS 

751 LSKTDVRGNV SLADHAHLNL TGLATFNGNL VQAETRTIRL RANATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASDNASFNLS NNAVQNGSLT LSDNAKAHVS 

851 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 

951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT WEGKDNTPL SENLNFTLQN EHVDAGAWRY QLIRKDGEFR 

1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

1101 AGRNATEKAE SVAEPARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 

1301 GISAGAGFSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 

1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 

1401 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 

1451 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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MKTTDKRTTETHRKAPKTGR I RFS P AYLAI CLS FG I LPQAW AGHT Y FG IN YQYYRDFAEN 
| I I | | | | I | | I II I I it I I I I I I I M I H I I I I I I I I I 1 I | I I M M I M II I I I II I I 
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110 



120 



70 80 90 100 

KGKFAVC^DIEVYNKKGELVGKSMTKAPMIDFSW 

I I I I | | | | II I I I I | | | | | | I I II II I I I I I I I I I t I I I I I I H : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [ 
KGKFAVGAKD I EVTWKKGELVGKSMTKAPM I DFS WS RNGVAALAGDQYI VS VAHNGG YK 

90 100 HO 120 



70 
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170 



180 



130 140 150 160 

NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

11,11,1! | in || |:|:||||IMI II hlllllHMM I inillllimiMM 
NVDFGAEGSN^^ 

150 160 170 180 



130 



140 



190 200 210 220 230 240 

DGRKYI DQNN YPDRVRI GAGRQYWRS DE DEPNNRES S YH I AS AY SWLVGGNT FAQNG SGG 

,, ii i i : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > JL 

DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

210 220 230 240 



190 



200 



250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

I IMIIH llll I II II HIM IIH M I I I 1 I 1 1 I 1 I I 1 I I 1 I I 1 1 i 1 1 1 I 1 1 1 1 I 1 1 1 
GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
250 260 270 280 290 300 

310 320 330 340 350 360 

QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 

Hill 1111 I II I Mill I II I lUMIM III: II l: I I I: Ml: I III MMIII 
OLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

310 320 330 340 350 360 

370 380 390 400 410 420 

QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYF 

| ,| | | | | ,1 ,| | I | | I II II I II I I I II I I I I M I I I I I : M M I II I M I II I I Ml I I 
OLFNVSLSETARE PVYHAAGGVNS YRPRLNNGEN I S FI DKGKGELI LTSN INQGAGGLYF 
370 380 390 400 410 420 

430 440 450 460 470 480 

OGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 

rTmiiimiiMiiiiiii: iimiimmiimmn iimiiimi:i 

EGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSVS 
430 440 450 460 470 480 

490 500 510 520 530 540 

VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

HI, | mil I M:MM I M MM M MM Mill Mill Ml MINI Ml I I 11 II 
VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 
490 500 510 520 530 540 

550 560 570 580 590 600 

SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 

iiiMiMiMiMMMmiiimiiiiMMMMimmmiiimiMM 

SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKEIAYNGWFGEKD 
550 560 570 580 590 600 

610 620 630 640 650 660 

TTKTNGRLNLNTYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 

mimiii m mMimiiMiiiiMimmiiiimmi:: if. 

ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 
610 620 630 640 650 660 

670 680 690 700 710 720 

KEGIPRGEIVWDNDWINRTFKAENFQIKGGQAVVSRNVAKVKGDWHLSNHAQAVFGVAPH 
M M : M I I M M M : M II M M : I : M M M II M M I : M II M I I M II M M I I 
MEG I PQGE I VWDN DW I DRT FKAEN FHI QGGQAWSRNVAKVEG DWHLSNHAQAV FGVAPH 
670 680 690 700 710 720 
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730 740 750 760 770 780 

QSHTICTRSDOTGLTNCVEKTITDDKV^ 

1 1 It I II II I I I I M : I : U I I M | | | I M I = I 1 1 1 I 1 1 : I I I I I I H I I M I I I 1 1 1 1 
GSHTICTRSDW^^ 

■730 740 750 760 770 780 

790 800 810 820 830 840 

SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 

I i. I I I- | | | :: I II I 1 II II I I II II I I I I I M I I i I II I I I I I I I I I : : i I i I 1 > I I 
SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 

790 800 810 820 830 840 

850 860 870 880 890 900 

LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 

U 1 | 1 1 n I I I 1 I 1 1 I i 1 1 I I I I I II I I : M I I I • I M I ! I H 11 I I » I M I I < > ' J 
LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 

I | I I I I I I f i I I I I I 1 I 1 1 I I I I 1 I I I I = 1 I 1 I I 1 t I illllMIIICHIIllll 

GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR RSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

VNGKLNGQGTFRrMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTVVEGKDN 
lltmmilMMIliilSI MMMIIIIIMIIMMIIimililMimil 
VNGKLNGQGT FRFMSELFGYRSGKLKLAES SEGT YTLAVNNTGNEPVSLEQLT WEGKDN 
960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 
KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHKPVKEQELSDKLGKA 

| M | | l | II I I I 1 1 1 1 1 1 1 I I I M I I II I I 1 1 1 1 I I II I I I I I I M I I I 
TPLSENLKFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 

1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 
EAKKQAEKDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQ 

| | : | | I I I I I I 1 1 1 1 I I I I I I : I : I I : I I M I I I I 1 1 I I I 1 : 1 1 I M I III 1 1 N 
QAQLAAKQQAEKDNAQSLDALIAAGRNATEKAESVAEPARQAGGENAGIMQAEEEKKRVO 
1080 1090 HOC 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 i i ii 1 1 1 1 MiiiMiMimiMiimm 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
U40 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 

i ! 1 1 1 1 1 1 I I 1 I 1 1 I I 1 1 1 1 1 I I IIMIIINIIIIUilMllllllllllllllMII 
ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 

1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii iMiiiimimiMi 

SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

GGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 s r 1 1 i 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 

RGK I RRRVLH YG I QARY RAG FGGFGIEPHI GATR Y FV QKAD YR YENVN I AT PG LAFNR YR 
1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1 | | t I I I 1 I I 1 I I I I I 1 i 1 1 I I I I 1 1 I 1 1 I I 1 I i I 1 t I I I t I I t I I 1 I I I I I I I 1 I ! I I 1 
AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1380 1390 1400 1410 1420 1430 
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orfl-l.pep 
orf lng-1 



1430 1440 1450 

KG FTLS LHAAAAKG PQLEAQHS AGI KLGYRWX 

ii in ii inn i ii imiii mm mini 

KG FTLS LHAAAAKG PQLEAQHS AG I KLGYRWX 
1440 1450 1460 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 
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SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 
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orf lng-1 .pep 
p45387 



orf lng-l.pep 



p45387 



orf lng-l.pep 



10 20 30 40 50 

MKTT DKRTTETHRKAPKTGRI RFS P AYLAICLS FG I LPQARAGHT YFG "T^?"? 

| : I : I : I : I I : I I I II I I I I I : M M I I M I I 
MKKTVFKLN FLTAC IS LG I VSQAWAGHTY FG IDYQYYRDFAEN 
10 20 30 40 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 

M I I : I M : : I : I M I : I : II I I I II 1 1 1 1 1 II II II M II 1 1 : : II I II I I I I IM 
KGKETVGAQNIKVYNKQGQLVGTSMTKAPMIDFSW 

60 70 80 90 100 



50 



180 



p45387 



orf lng-1 -pep 



p45387 



orf lng-1 .pep 



130 140 150 160 170 

NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

•MMMMMMMMMIMIIIM I Ml Ml MIMIMM I = = I M 1 
DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 

U0 120 130 140 150 160 

190 200 210 220 230 240 

DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

: | I : I : M I : I II II : II I : M : I M : : 
NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD 

110 180 190 



MM :|::||l I U 
QVAGAYHYLTAGNTHNQRGAGN 

200 210 



p45387 



orf lng-l.pep 



p45387 



orf lng-1 .pep 



p4 5387 



orf lng-1 .pep 



250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
, ||:: | : M M Ml M MM I MM I: II MM M: I: MM M Ml 
GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 
220 230 240 250 260 270 

310 320 330 340 350 360 

QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

1 1 | | I :: | MM I I : : I M I :: MM Ml I :: > ; 
QLVRKSVF-DEIFERDLHTSLYTRAGNGVYTISGNDNGQGSITQKS GIPSEIK 1 

280 290 300 310 320 

370 380 390 400 410 419 

QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

I , -M :: |:: | I I Mill II:: Ml*- :l I I :: 1 : 1 1 1 1 1 1 Ml 
TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNMGETLYEWDQKQGSLIFASDINQGAGGLY 

330 340 350 360 370 380 



470 



479 



p45387 



420 430 440 450 460 

FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 

M M M II I : : M I M M I : I : I : : III M I m < : M M I M II M I II M II : M : 
FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 

400 410 420 430 440 



390 



530 



539 



480 490 500 510 520 

or f lno- 1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQF^PDKLYFGFRGGRLDIJ^G 

M II M M I : I M M II : M II II I II M M M I M I Mil: M : II M I M II I M I 
SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDIjNG 

450 460 470 480 490 500 

540 550 560 570 580 590 

HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 

I M : 1 : M M M M M M M I : : : M M M : : I : MM MM : I II M M I M 
HSLTFKRIQ^DEGAMIVNHKTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 

510 520 530 540 550 560 



p45387 
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600 610 620 630 640 650 

orflna-1 pep EKDATKTNGRI^LNYQPEEADRTLLLSGGTNl^GNITQTNGKLFFSGR^P^YNHLGSG 

r>4S3S7 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 
P 570 580 590 600 610 620 

660 670 680 690 700 710 

or'lna-1 pep W SKMEG I PQGE I VWDN DWI DRT FKAEN FH IQGGQAW SRNVAKVEG DWHLSNHAQAVFGV 
9 P P M : | I t I I I I 1 1 I | | : | 1 1 : I I I I 1 1 I I : I : I I : N > I I I I : : : I I : I : I I : I : I j I 1 1 
D45387 WSEMEGIPQGEIVWDHDWINRTFKAENFQIKGGSAWSRNVSSIEGNWTVSNNANATFGV 
P 630 640 650 660 670 680 

720 730 740 750 760 770 

orflnq-1 pep APHQSHTICTRSDWTGLTSCTEKT1TDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 
9 P P : I: |:: II II II I I ||||:| : : I I I M I : I I : I I : : : I : I : I U Ml M 
o45387 VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 

P 690 700 710 720 730 740 

780 790 800 810 820 830 

orflna-1. pep GKLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 
o ng P F it.. Ill MM ! 

p 4 5387 GNVTL TNHSQFTLSNNATQIG 

750 760 '70 



orf lng-l.pep 



840 850 860 870 880 890 

SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 



. . 1:1::: HIM I : I : I I : : I I : I : : I : I I I : : I : : 11 = 11 

o45387 nirlSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 
P 780 790 800 810 820 830 

900 910 920 930 940 950 

orflna-1 pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 
9 " P P |||l:|:|:|lllllll ::|: 1:111111111111 

04 53B7 TT LQN LT LNN S T I TLN SAY S AS SN N T PRRRS LETETT PT S AEHR FNTLT 

y 840 850 860 870 

960 970 980 990 1000 1010 

o-f lna- 1 pep VWGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 

11111:11111:1 I 1111:1 1 1 i I — = = I i ' 1 = 1 M I : I I : I I I I I : I I : 1 1 I 
D45387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 
880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

orflna-1 oep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
- * F ^ I I I:: I: I I I: 1:1 I I II I 1 1 : I : : : I I I I I N I I : I I I H : I :l ::l :l II 
^45387 Q P LS DKLKFT LEN DH V D AG ALR Y KL VKN DGE FR L HN P I KEQE L HN DL VRAEQAE RT LEAK 

P 940 950 960 970 980 990 

1080 1090 1100 1110 1120 . 1130 

orflnq-1 pep QAQLAAKQQAEKDNAQS LDAL IAAGRNAT -EKAESVAEPARQAGGENAGIMQAEEEKKRV 

|:: :|| |: : :::l I II :: • • • I MM M :::: : 1:1 
o4538 7 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLLNALEAKQAE-LTAETQKSKAKTKKV 

1000 1010 1020 1030 1040 1050 

1140 1150 1160 1170 1180 1190 

orflna-1 oep QADK- - - DTALAKQREAETRPATT AFPRARRARRD- LPQPQPQPQPQPQRDLI SR Y AN S G 
9 -P P v ^ ^ ^ _ ( _ :::::| | | : : | : | : 1 1 I I I 1 : 1 1 : 

D45387 RS KRAV FS DPLLDQSL FALEAALE V I D APQQS EK DRLAQEEAEKQ- RKQKDL I S RY SN S A 

1060 1070 1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

or*lna-l pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 
9 U|:|||:||:::limil:l::: :l ::l I: Mil: I I MMIM 

d4 53 B 7 lselsatvnsmlsvqdeldrlfvdqaqsavwtniaqdkrrydsdafrayqqqktnlrqig 

1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

orflna-1 pep MQKKLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 
9 ' PP :n :|||:|: MM: : I Ih : MM I : : : I : : : I : I : I s s 

o45387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
P 1180 1190 1200 1210 1220 1230 
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1320 I 330 1340 1350 1360 I 370 

orflna-l.pep SLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGL 
9 :::: | | : | : : : : I I : : I I : : | : 1 1 : | : : I : : I M : : : : | : | : I : I I : I 

D45387 KMAEEQSRKIHRKAINYGVNASYQFRIX3QLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 
P 1240 1250 1260 1270 1280 1290 



1380 1390 1^00 1410 1420 1430 

AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 

I I || I I I I : : I I : i I : : : I I : II: : : I : I : : : : : I : 1 M : I I I II: : . I 
10 P4 5387 AFNRYNAGIRVDYTFTPTDNI SVKPYFFVNYVDVSNANVQTTVNLTVLQOPFGRYWQKEV 

1300 1310 1320 1330 1340 1350 



orf lng-1 .pep 



1440 1450 1460 1469 

orf lng-1 .pep GVNAEI KG FT LS LHAAAAKGPQLEAQHSAGIKLG YRWX 
IS I:: 111 1:1 : ::l II l:::l:IIIIM 

P 45387 GLKAEILHFQISAFISKSQGSQLGKQQNVGVKLGYRW 
1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

20 Example 78 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 655>: 

1 ..AAGGTGTGGC AATTTGTCGA AGA . CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

25 151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

30 401 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 . .KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAAHFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 
101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

35 Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

40 201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; ORF6-l>: 

45 1 . . LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 

51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
50 ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of N. 
meningitidis: 

10 20 30 
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KVWQFVEXPLRAVVPADSFEPTAQKLNLFK 

I'lll II I I M M I I | I | | | | | | | I I 1 I I 
QIVEHAVUiTPSSFNSQSARVWLFGEEHDKVWQFVEDALRAVVPADSFEPTAQKLNLFK 

40 50 60 70 80 90 

40 50 60 70 80 90 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTIAAVGVGANLQHY 
I t I I 1 1 1 I | | | | | | | I I I I I I 1 I I I 1 t 1 I I I I 1 I 1 t 1 I I t 1 I ! I I I t I t ! I 1 | 1 1 1 I I I I 
AGAATILFYEDQNWKGLOEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
100 110 120 130 140 150 

100 110 120 130 140 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length ORF6a nucleotide sequence <SEQ ID 659> is: 

1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RVWLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 



or f 6. pep 
orf6a 

orf6.pep 
orf 6a 

orf 6. pep 
orf6a 



ORF6a and ORF6-1 show 100.0% identity in 131 aa overlap: 

50 60 70 80 90 100 

orf 6a. pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I I I I } I I I t I I I I I I t I I I I It I I I It I 
or f 6- 1 LRAW PADS FE PT AQKLNLFKAGAAT ILFY 

10 20 30 

110 120 130 140 150 160 

orf 6a . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA ' 

I || II I I I I I I I I I I I I I I I I I I I I II I II t I I M I I I II I I I I 1 I t 1 t I I I I 1 I I I I 1 I 
orf 6-1 EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
40 50 60 70 80 90 

170 180 190 200 

orf 6a .pep KAWN I PENW L LRAQMV I GG I EG AAG E KT FE P V AER LKV FG AX 
I 1 I I I I 1 I I I I I I I I I I I I I t I II I I I I I I I I 1 I I I I I I I I t 
orf 6-1 KAWN I PEN WLLRAQMV I GG I EGAAG EKT FE PVAE RLKV FG AX 

100 110 120 130 

Homology with a predicted ORF from N. gonorrhoeae 

ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 
N. gonorrhoeae: 



.9S2457BA2J_> 



WO 99/24578 



PCT/IB98/01665 

-378- 



KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 
orf6 -PeP Mill mm || | HI III 11:111 

orf6ng snvsldmsnptvij^lplyiaslrrgaiykvwqfvedalravvpadsfeptaqklklfk 

orf6.pep AGAATILETEOQNyVKG^EQFPAYAAN *° 



64 



orf 6ng 



, , I, I | I i | i | | | | | | I I I i I i M | | | | | I I I I I M I I I I I I I I I II I I I M : I I I I Ml 



124 



orf 6 oeo N PLPDAAI AKAWN I PENWLLRAQMVI GG I EGAAGEKTFE PVAERLKV FGA 140 

10 ° r ^PeP 7] 1 1 1 1 | 1 1 1 1 1 1 1 | | | | | | M 1 1 1 1 I I I 1 1 1 I I Ml 

orf6ng KPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 

The complete length ORF6ng nucleotide sequence <SEQ ID 661> was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

15 51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

90 301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

401 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

501 acgtttgAAA GTGTTCGGCG CATAA 

25 This encodes a protein having amino acid sequence <SEQ ID 662>: 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV E DAL RAW PA 

51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 

101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 

151 GGIEGAAGEK VFEPVAERLK VFGA* 

30 

ORF6ng and ORF6-1 show 96.9% identity in 131 aa overlap: 



orf 6-1 .pep 



10 20 30 

LRAWPADS FEPTAQKLNLFKAGAAT I LFY 
I I I 1 I I 1 I I I I 1 I I 1 I I = I I I I 1 I I I t t 1 I 
15 orf6na PTVLRMGLPLYIASLRRGAIYKWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 

JJ y 20 30 40 50 60 70 



40 50 60 70 80 90 

orf6-l Deo EDQN WKGLQEQFPAYAAN FPVW ADQAN AMV QYAVWTTLAAVGVGANLQHYN PL PDAAI A 

40 II | llll Illll II Mill I II Mill I III! Ill I I 1:1 I II I I INI ICIll 

orf6na ED qn\A^GLQEQFPAYAANFPWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
y 80 90 100 110 120 130 

100 110 120 130 

45 orf 6-1 . pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

| | | I I I I I I I I I I M I I I I I I II M I I : I II I I I I M M I II 
orf6nq KAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGAX 
9 140 150 160 170 

50 It is predicted that the proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 79 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 663> 

1 GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

55 5! CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGG^ CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 



BNS0OQ0: <WO 9SC4578A2_L> 



WO 99/24578 



PCT7IB98/01665 



-379- 



251 
301 
351 
401 
451 
501 
551 
601 



CATTTCGGGC 
rCTGCGCgGC 
GGCGCGAACG 
ATCGCACCGC 
AGAAACCGCC 
ATGCCACCGC 
AGCCACCACC 
CCAAGACTGG 



TGGACGCGGA 
CGCCTGGTTT 
CAGCCGskAT 
AAACCCGCGT 
GACGCGCCGC 
CTTCGGCCCG 
GTGCGCTCAA 
AAACTCAAAG 



CGTATCGGGC 
CCAcCTTCGG 
GCCGAACTCT 
CCACGCArGC 
TCAGcTACGC 
AAAGACAACC 
CCTGTTCGCC 
CCGAATACGA 



AGCCTGAACA 
ACGCGGCGAC 
ACGGCATTTT 
ATGGACTACC 
CGTGTACGAC 
CCGCCACAAA 
GGCATCGAAC 
CTAC. . 



CCGAAG.crC 
TCGTGGCGGC 
GGAATACGAC 
AGCAGGCGAA 
AGCCAAGGTT 
TTGGGCGAAC 
ACCGCTTCAA 



This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 

10 1 . . GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

1 5 Further work revealed the complete nucleotide sequence <SEQ ID 665>: 



20 



25 



30 



35 



40 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGACACGCT 
CGCGCAGGCC 
CTGAATTGCC 
GACGGCTACA 
CCTGCGCGAA 
GCGACCAAAA 
ACCAGCCGCC 
CGCGCGCGGC 
CCGACGCGCT 
GTAGAAGTCG 
TTCCGCCACC 
TTGAAGTCCG 
GACGTATCGG 
TTCCACCTTC 
ATGCCGAACT 
GTCCACGCAG 
GCTCAGCTAC 
CGAAAGACAA 
AACCTGTTCG 
AGCCGAATAC 
CAGGCGTGCT 
GGTTATTGGC 
CGGCAAATAC 
ACGGTTACAA 
AACGCCATTC 
GCCTGCATCG 
TCGGCGGCTA 
ATTTTGGGCG 
CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGCCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATCGGCGC 
ACGCTCCGCA 
CCGCCAAAAA 
ATCCGCGCGC 
TACCGCACCC 
CGCGGCGTTT 



TCAAATATTC 
GATGTTTCTG 
GACCATCACC 
CTGTTTCCGG 
ATCCCGCAGA 
CATCAAAACG 
AGATTTACGG 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCG 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATCGA 
GACTACACCC 
TTCCATCGAC 
ACGCCGACCC 
CGCCTGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GACGATACAC 
ATGACCTATG 
CGACCTGACC 
TCCCGCAATC 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACTGCCTACC 
AGGCGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



CCTGCTGTTT 
TTTCAGACGA 
GTTACCGCCG 
CACGCACACC 
GCGTCAGCGT 
CTCGACCGCG 
CTCCGACCGC 
CCAAC7ACCA 
GGCAATGCCA 
GGCGGGGCTG 
TGCGCAAACG 
GGCAACCGCA 
CACCGAAGGC 
ACTCGTGGCG 
TTGGAATACG 
CCAGCAGGCG 
ACAGCCAAGG 
AATTGGGCGA 
ACACCGCTTC 
GCAGCCGCTT 
CACAACACCG 
GCGCACCCAC 
GCCGCGAACA 
AACAAATACG 
CGAATTTTCC 
CCATCCCGCA 
CGTTTCCGCG 
CCGTTACCGC 
TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGG 
GCCGCCGTGT 
CGACCCGAGC 
ACGGCTGGGA 
CAGGCAGGTT 
GAACCCCGAC 
ACTTTGCCCC 
TGGCAGAGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



GCCGCCCTGT 
CCCCAAACCG 
ACCGCACCGC 
CCGCTCGGGC 
CATCACATCG 
CCCTGTTGCA 
GCGGGCTACA 
AATCAACGGC 
ACACCGCCGC 
CTGGACGGCA 
CCTGACCCGC 
AACATTTCGG 
ACGCTGCGCG 
GCGGCGCGAA 
ACATCGCACC 
AAAGAAACCG 
TTATGCCACC 
ACAGCCGCCA 
AACCAAGACT 
CCGCCAGCCC 
CCGCCACCGA 
AGCGCCAGCG 
CGATTTAATC 
GCGAACGCAG 
CGCACGGGTG 
ATACGGCACC 
CCGCCGACAA 
ACCGGCAGCT 
CCGTTTCACC 
CTCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTACCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAATCTGTT 
GGCGCACTGC 



TGCCCGTGTA 

CAGGAAAGCA 

GAGTTCCAAC 

TGCCCATGAC 

CAACAAATGC 

GGCGACCGGC 

ACTACCTGTT 

ATCCCCGTTG 

CTATGAGCGC 

CGGGCGAGCC 

AAGCCATTGT 

GCTGGACGCG 

GCCGCCTGGT 

CGCAGCCGCG 

GCAAACCCGC 

CCGACGCGCC 

GCCTTCGGCC 

CCGTGCGCTC 

GGAAACTCAA 

TACGGCGTAG 

CCTGATTCCC 

TGTCATTGAT 

GCGGGTATCA 

CATCATCCCC 

CCTACCCGCA 

AGGCGGCAAA 
CCTTTCGCTG 

ACGACAGCCG 
CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTACCGCGC 
GGCGGCCGCA 
CAAAACCCGC 
AACGCAGCTT 
AGCGGCTGGA 
CGACCCTGCC 
CCGACAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



60 This corresponds to the amino acid sequence <SEQ ID 666; ORF23«l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSV5VITS QQMRDQKIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GRANT AAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

65 201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 



BNSOOCIO: <WO_ 9924S78A2J_> 



15 



WQ99/24578 PCT/IB98/01665 

-380- 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

5 451 ILGGRYTRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

10 701 YRTQPDRHSY GALRTVNAAF TYRFK* 

Computer analysis of this amino acid sequence gave the following results: 

Homolo gy with the ferric-pseudobactin receptor PupB o f Pxeudomonas putida (accession number P38047) 

ORF23 and PupB protein show 32% aa identity in 205aa overlap: 

Orf23 6 FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVKLVRK 65 

++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 
PupB 215 WS RG FAI QN YE V DGV PT S TRL - DN Y S QSMAMFDR VE I VRG ATG L I S GMGN PS AT INL I RK 273 

Orf23 66 RLTRK PLFEVRAEAGNRKHFG LDADVSGS LNTEXXLRGRLVST FXXXXXXXXXXXXXXAE 125 
R T + + EAGN +G DVSG L +RGR V+ + 

20 PupB 274 RPTAEAQASITGEAGNWDRYGTGFDVSGPLTETGNIRGRFVADYKTEKAWIDRYNQQSOL 333 

0rf23 126 LYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYD — SQGYATAFGPKDNPATNWAN 183 

+YGI E+D++ T + Y + D+PL + S G T N A +W+. 

PupB 334 MYGITEFDLSEDTLLTVGFSY — LRSDIDSPLRSGLPTRFSTGERTNLKRSLNAAPDWSY 391 

25 

Orf23 184 SHHRALNLFAGIEHRFNQDWKLKAE 208 

+ H +FIE+ WKE 
PupB 392 NDHEQTSFFTSIEQQLGNGWSGKIE 416 

30 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A of N. 
meningitidis: 

10 20 30 

orf23 npn GYNYLFARGSRIANYQINGIPVADALADTG 
35 °««-P p I | | | | | I | | | | I I I I I I I I I I I I I I I I 1 I I 

orf23a QMRDQNIKALDRALLQATGTSRQIYGSDRAGYTJYLFARGSRIANYQINGIPVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

40 orf23 pep nantaayervewrgvaglldgtgepsatvnlvrkrltrkplfevraeagnrkhfgldad 

P P i i | | | | | 1 | I 1 1 I I I M I I 1 1 I I I 1 1 1 1 I I I I I t M IMMMIIMIMII1IM II 
orf23a nantaayervevvrgvaglldgtgepsatvnlvrkrptrkplfevkaeagnrkhfglgad 

150 160 170 180 190 200 

45 100 110 120 130 140 150 

orf 23 . pep vsgslntexxlrgrlvstfgrgdswrrrersrxaelygileydiapqtrvhaxmdyqqak 

HIM I: I :! MINIM 1111111:1 I IN 1 I II I I I I I I I I I I M I I I III! Ml 

orf 23a vsgslnaegtlrgrlvstfgrgdswrqrersrdaelygileydiapqtrvhacmdyqqak 

210 220 230 240 250 260 

50 160 170 180 190 200 210 

orf 23 pep etadaplsyavydsqgyatafgpkdnpatnwanshhralnlfagiehrfnqdwklkaeyd 
U I I I I I I M I M 1 M I I I 1 I I I I I I I M II M M I I I I M I M M I I M I I II M I II I 

orf 23a ET adaplsyavydsqgyatafgpkdnpatnwansrhralnlfagiehrfnqowklkaeyd 

55 ~ 270 280 290 300 310 320 

orf 23. pep Y 

60 orf 23a ytrsrfrqpygvagvlsidhntaatdlipgywhadprthsasvsligkyrlfgrehdlia 

330 340 350 360 370 380 

The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 



BNSDOCID *WO 992*57BA2_L> 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGACACGCT 
CGCGCAGGCC 
CTGAATTGCC 
GACGGCTACA 
CCTGCGCGAA 
GCGACCAAAA 
ACCAGCCGCC 
CGCGCGCGGC 
CCGACGCGCT 
GTAGAAGTCG 
TTCCGCCACC 
TTGAAGTCCG 
GACGTATCGG 
TTCCACCTTC 
ATGCCGAACT 
GTCCACGCAG 
GCTCAGCTAC 
CGAAAGACAA 
AACCTGTTCG 
AGCCGAATAC 
CAGGCGTGCT 
GGTTATTGGC 
CGGCAAATAC 
ACGGTTACAA 
AACGCCATTC 
GCCTGCATCG 
TCGGCGGCTA 
ATACTCGGCG 
CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGCCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATCGGCGC 
ACGCTCCGCA 
CCGCCAAAAA 
ATCCGCGCGC 
TACCGCACCC 
CGCGGCGTTT 



TCAAATATTC 
GATGTTTCTG 
GACCATCACC 
CTGTTTCCGG 
ATCCCGCAGA 
CATCAAAGCG 
AGATTTACGG 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCG 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATCGA 
GACTACACCC 
TTCCATCGAC 
ACGCCGACCC 
CGCCTGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GCAGATACAG 
ATGACCTATG 
CGACCTGACC 
TCCCGCAATC 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACTGCCTACC 
AGGCGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



CCTGCTGTTT 
TTTCAGACGA 
GTTACCGCCG 
CACGCACACC 
GCGTCAGCGT 
CTCGACCGCG 
CTCCGACCGC 
CCAACTACCA 
GGCAATGCCA 
GGCGGGGCTG 
TGCGCAAACG 
GG CAACCGCA 
TGCCGAAGGC 
ACTCGTGGCG 
TTGGAATACG 
CCAGCAGGCG 
ACAGCCAAGG 
AATTGGGCGA 
ACACCGCTTC 
GCAGCCGCTT 
CACAACACCG 
GCGCACCCAC 
GCCGCGAACA 
AACAAATACG 
CGAATTTTCC 
CCATCCCGCA 
CGTTTCCGCG 
CCGTTACCGC 
TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGG 
GCCGCCGTGT 
CGACCCGAGC 
ACGGCTGGGA 
CAGGCAGGTT 
GAACCCCGAC 
ACTTTGCCCC 
TGGCAGAGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



GCCGCCCTGT 
CCCAAAACCG 
ACCGCACCGC 
CCGCTCGGGC 
CATCACATCG 
CCCTGTTGCA 
GCGGGCTACA 
AATCAACGGC 
ACACCGCCGC 
CTGGACGGCA 
CCCGACCCGC 
AACATTTCGG 
ACGCTGCGCG 
GCAGCGCGAA 
ACATCGCACC 
AAAGAAACCG 
TTATGCCACC 
ACAGCCGCCA 
AACCAAGACT 
CCGCCAGCCC 
CCGCCACCGA 
AGCGCCAGCG 
CGATTTAATC 
GCGAACGCAG 
CGCACGGGTG 
ATACGGCACC 
CCGCCGACAA 
ACCGGCAGCT 
CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTACCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAATCTGTT 
GGCGCACTGC 



TGCCCGTGTA 
CAGGAAAGCA 
GAGTTCCAAC 
TGCCCATGAC 
CAACAAATGC 
GGCGACCGGC 
ACTACCTGTT 
ATCCCCGTTG 
CTATGAGCGC 
CGGGCGAGCC 
AAGCCATTGT 
GCTGGGCGCG 
GCCGCCTGGT 
CGCAGCCGCG 
GCAAACCCGC 
CCGACGCGCC 
GCCTTCGGCC 
CCGTGCGCTC 
GGAAACTCAA 
TACGGCGTAG 
CCTGATTCCC 
TGTCATTAAT 
GCGGGTATCA 
CATCATCCCC 
CCTACCCGCA 
AGGCGGCAAA 
CCTTTCGCTG 
ACGACAGCCG 
CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTACCGCGC 
GGCGGCCGCA 
CAAAACCCGC 
AACGCAGCTT 
AGCGGCTGGA 
CGACCCTGCC 
CCGACAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



45 This encodes a protein having amino acid sequence <SEQ ED 668>: 



1 MTRFKYSLLF AALLPVYAQA 



50 



55 



60 



51 DGYTVSGTHT 

101 TSRQIYGSDR 

151 VEWRGVAGL 

201 DVSGSLNAEG 

251 VHAGMDYQQA 

301 NLFAGIEHRF 

351 GYWHADPRTH 

401 NAIPNAYEFS 

4 51 ILGGRYSRYR 

501 SLFVPQSQKD 

551 LATAAGRDPS 

601 DQ DGSRLNPD 

651 TLRIPNPAAK 

701 YRTQPDRHSY 



PLGLPMTLRE 
AGYKYLFARG 
LDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAADNSRQK 
GALRTVNAAF 



DVSVSDDPKP 
IPQSVSVITS 
SRIANYQING 
VNLVRKRPTR 
GRGDSWRQRE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AYAVADIMAR 
TYRFK* 



QESTELPTIT 
QQMRDQNIKA 
IPVADALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



VTADRTASSN 
LDRALLQATG 
GNANTAAYER 
GNRKHFGLGA 
LEYDIAPQTR 
NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSK7R 
WQSETHTDPA 
LNVDNLFNKH 



ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 



65 



10 20 30 40 50 60 

or f 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I M I I I I I I I I I I 1 I I I I I I I I I I I II I I I I I I I M I I 1 I II I I I I I I I I I I I I I II II I 
orf23-l MTRFKYS1XFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 



BNSOOCID: <WO_99C457BA2.l. > 



WO 99/24578 
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70 80 90 100 I 10 120 

orf 23a oeo PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 
P I II I 1 I 1 I || | | | | | | | | | I I IH I II II : I f I I I I I I I I II I I I 1 M I I I I M I I I I II 

orf 23-1 PI^LPhrTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQlYGSDRAGYNYLFARG 

70 80 90 100 110 120 

130 140 150 160 1^0 180 

orf23a Deo SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 
orf23a.pep | | | 1 1 | || | | | | | || 1 1 I 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 1 1 I 1 1 1 I H M 1 1 1 1 I I 1 I I I I M 

orf23-l SRIANYQINGIPVADAIADTGNANTAAYERVEVVRGVAGLLDGTGEPSATVNLVRKRLTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf23a ueo KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
II III II Mill lllll I IMI I II I: II III II II III II M I 11:1 II MM I I I I I 
orf 23-1 KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf23a pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAEX3PKDNPATNWANSRHRAL 
i t II M I II II M II I M I I I I I M M I II I II II I M I I I I I I I I I I I I I M M I II I I 
orf 23-1 LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf23a pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
MMMMM II Mill Mil III Mill II IMIMI II Mill HI MM II III II I 
orf23-l NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf23a pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

1 I 1 I t I I M ! I 1 I 1 I 1 I I 1 1 M I I I i I I I I i I M M I f I M M M If I I I 1 M I 

0rf23-l SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf23a pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
r I 1 M M I I I I 1 I M I I I 1 ! 1 I M 1 I I I I I M I I I I i I I II I II I ! t I I I I i 1 I ! I I t I I 
orf23-l FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

or f 2 3a pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I M I M II II M II M I It I II II I It II II M M I II II II II I M II II I II M II I I 
orf23-l PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 23a . pep AAVYRARKNNIATAAGRDPSGKTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSOSKTR 
I | M I I I M I II II II I M I t II II M II M II I M I I M II M 1 1 I II M M I I I I I M 
orf 23-1 AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGR1TPEWQIQAGYSQSKTR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23a . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I M 1 1 M II I I II I 1 1 M 1 1 I I II M I 1 II M I II I 1 1 M M M I I M M I M t I M I I 
orf23-l DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 23a .pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

M It II M I I M I M M M II M M M I I M I M II M I I M I II I I M I I II II 

orf 23-1 ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



70 



orf 23a. pep 
orf23-l 



TYRFKX 
MM II 
TYRFKX 
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Homnlnfry with a predicted ORF frAm M annnrrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from N. 
gonorrhoeae: 

orf23 pep GYNYLEARGSRIANYQINGIPVADALADTGNAKTAAYERVEWRGVAGLLD 51 

5 || | | | || | i M | ] I I I I I I I lit I M II!) il || I I I I I M I I I I I 111 I 

orf 23ng SAVDACRI PGYNYLFARGSRIANYQINGIPVADAIiADTGNANTAAYERVEWRGVAGLPD 60 

orf23 pep GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 

in linn : iiiiiiiiiiiNiimii.iimiii:) ttimiiiiti 

1 0 orf 23ng GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

orf 23 . pep GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQOAKETADAPLSYAVYDSQGYATAF 171 

Mill: Nil IIIIIIMMIHMIMI I I t I I I I 1 1 I I I I I I I I I ! I I I II I I i I 
orf23ng GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 

orf 23 . pep GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 211 

I I I II M I I I : I I :: I I I I I I I I I I ! I I I U lit I I I I I I 
orf23ng GPKDN PATNWSN SRNRALNLFAGI EHRFNQDWKLKAEYDYTRSRFRQP YGVAGVLS I DHS 240 

The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
20 amino acid sequence <SEQ ID 670>: 

1 SAVDACRI PG YNYLFARGSR IANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDN PATNW SNSRNRALNL 

25 201 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPKA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

4 01 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

30 4 51 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLK VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence <SEQ ID 67 1 >: 

35 1 ATGACACGCT TCAAATACTC CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CCGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CCGTTTCCGG CACGCACACC CCGTTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

40 251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 

45 501 TTCTGCCACC GTCAATCTGG TACGCAAACA CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCC GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

50 751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CAGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CAAAAGACAA CCCCGCCACA AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATAGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 

55 1001 CAGGCGTACT TTCCATCGAC CACAGCACTG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCcgatCC GCGCACCCAC AGCGCCAGCA TGTCATTGAC 

1101 CGGCAAATAC CgcctGTTCG GCCGCGAGCA CGATTTAATC GCGGGTATCA 

1151 ACGGCTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATTCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGCG CCTATCCGCA 

60 1251 GCCATCATCG TTTGCCCAAA CCATCCCGCA ATACGACACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GCAGATACAG CCGCTACCGC GCAGGCAGCT ACAACAGCCG 
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CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGGCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATcggTGC 
GCGCTCCGCA 
CCGCCAGAAA 
ATCCGCGCAC 
TACCGCACCC 
CGCGGCGTTT 



ATGACCTATG 
CGATCTGACC 
TCCCGCAATT 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACCGCCTACC 
GGGTGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 
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TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGA 
GCCGCCGTGT 
CGACCAGAGC 
ACGGCTGGGA 
CAGGCAGGCT 
GAACCCCGAC 
ACTTAGCCCC 
CGGCAGGGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTAcCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAACCTGTT 
GGCGCACTGC 



CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTATCGCGC 
GGCGGCCGCA 
CAAACCCCGC 
AACGCAGCTT 
AGCGGCCGGA 
CGACCCAGCC 
TCGCCAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 



1 MTRFKYSLLF AALLPVYAQA 



20 



25 



30 



51 DGYTVSGTHT 

101 TSRQIYGSDR 

151 VEWRGVAGL 

201 DVSGSUJAEG 

251 VHAGMDYQQA 

301 NLFAGIEHRF 

351 GYWHADPRTH 

401 NAIPNAYEFS 

451 ILGGRYSRYR 

501 SLFVPQLQKD 

551 LATAAGRDQS 

601 DQDGSRLNPD 

651 ALRI PNPAAK 

701 YRTQPDRHSY 



PFGLPMTLRE 
AGYNYLFARG 
PDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASMSLTGKY 
RTGAYPQPSS 
AGSYNSRTQG 
EHGSYLKPVT 
GKTYYRAANQ 
SVPERSFKLF 
ARAVANSRQK 
GALRTVNAAF 



DVSVSDDPKP 
IPQSVSVITS 
SRIANYQING 
VNLVRKHPTR 
GRGDSWRQLE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYDT 
MTYVSANRFT 
GNNLEADIKG 
AKTHGWEIEV 
TAYHLAPEAP 
AYAVADIMAR 
TYRFK* 



QESTELPTIT 
QQMRDQNIKT 
IPVADALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGRTIGAGVR 
YRFNPRTELS 



VTADRTASSN 
LDRALLQATG 
GNANTAAYER 
GNRKHFGLGA 
LEYDIAPQTR 
NWSNSRNRAL 
HSTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKPR 
RQGETHTDPA 
LNVDNLFNKH 



ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 
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orf 23-1. pep 
orf23ng-l 

orf 23-1. pep 
orf23ng-l 

orf23-l.pep 
orf23ng-l 

orf23-l.pep 
orf23ng-l 

orf 23-1. pep 
orf23ng-l 

orf 23-1. pep 
orf 23ng-l 



10 20 30 40 50 60 

MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSN DGYTVSGTHT 

1 1 | | | | 1 1 I I | I I I I I I I 1 I 1 1 I I I 1 I I 1 1 1 1 f 1 I t I I I I 1 Mill IIIIIIIIMIMI 
MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 

70 80 90 100 110 120 

PLGLPMTLREIPQSVSVITSQOMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

| : | | | | t | | I t I 1 I I 1 I I 1 1 t I I t 1 I I 1 t 1 1 I 1 I I I I t I 1 I I t I t 1 I I 1 I I I t I 1 I I 1 I 1 
PFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
70 80 90 100 110 120 

130 140 150 160 170 180 

SRIANYQINGIPVADAIJU)TGHA>rrAAYERVEVVRGVAGLLDGTGEPSATVNLVRKRLTR 

| | 1 | 1 I 1 I I 1 I 1 I ! I I ! I I I I I 1 I 1 t 1 I I t t I t I t I I t I I lllllllllilllli: II 
SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPDGTGEPSATVNLVRKHPTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

| | | | | | | | I I I I I I I I 1 I | I 1 1 I I I I : I I 1 1 I I I I I II I I M 1 I I I : I I I I I M M II 
KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

Mill || Ml Ml IMI 1111 Mil I lltllM I II MUM MM MM II: II 1:1 II 
LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHNTAATDLI PGYWHADPRTH 
I | | I I M M M I M M I I M I I M I M M M M I M M I M : I I I I M II I I M M M II 
NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHSTAATDL I PGYWHADPRTH 
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320 



330 



340 



350 



360 
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370 380 390 400 410 420 

Orf23-l pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
111:11 I I | | | | [ j | | I I } I I I I I M i I I I I | | 1 | [ | | | | | I I t M I I t t 1 I I M I I : I 
Orf23ng-l SASMSLTGKYRLFGREHDLI AG INGYKYASNKYGERS I IPNAI PNAYEFSRTGAYPQPSS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 23-1 . pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
* * 111 I IHI I I | J | | t 1 I I t I 1 1 1 I I 1 I I I t I I I | I : t t 1 r I 1 I - 1 t 1 I I I i I i I I I I 1 r 

0rf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 23-1 . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
1 I I | | | I I I I I j I | | | M I I I I I I I I i I 1 I I I I I I I I I I I I I I i i I I I I I I I 1 I I 1 I I 
orf23nq-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 23-1 . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I | | | | | 1 | I I I I I I I I I I 1 I 1 1 I I I I 1 1 t I 1 i I 1 I 1 I I I I I I I I 1 1 1 t I I I ! I I I I 1 I 
orf23ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23-1 . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I | | I | | | | I I I I ! I I I I I I I I I I I : I I II I I I I I I I I I I I : U I I I I I : I I I I I I I I I 
orf23ng-l DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 23-1. pep ARAADN SRQKAYAVAD I MAR YR FN PRAELSLNVDNLFNKHYRTQPDRHS YGAX-RTVNAAF 
III: | | | I { I I I M I I I I I I 1 I I I I : I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
orf23ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 
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orf 23-1. pep TYRFKX 
I I I I I I 

orf23ng-l TYRFKX 

In addition, ORF23ng-l shows significant homology with an OMP from E.coli: 

splP16869|FHUE_ECOLI OUTER-MEMBRANE RECEPTOR FOR FE { III ) -COPROGEN, FE(III)- 
FERRIOXAMINE 5 AND FE (III) -RHODOTRULIC ACID PRECURSOR >gi 1 1651542 Ignl |PID|dl015403 
(D90745) Outer membrane protein FhuE precursor [Escherichia coii] 
>gi|1651545|gnl|PID|dl015405 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi 1 1787344 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe (III) -rhodotrulic acid precursor 
[Escherichia coli) Length - 729 
Score » 332 bits (843), Expect = 3e-90 

Identities - 228/717 (31%), Positives - 350/717 (48%), Gaps = 60/717 (8%) 

TITVTADRTASSN— DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 
T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 



Query: 


38 


Sbjct: 


43 


Query: 


96 


Sbjct: 


103 


Query: 


148 


Sbjct: 


155 


Query: 


207 


Sbjct: 


215 


Query: 


267 



LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP- 
G S+ SDRA Y ++RG +1 NY ++GIP 



- VADALADTGN ANT AA 14 7 

+ DAL+D A 
SLGDALSDM AL 154 



+ERVEWRG GL GTG PSA +N+VRKH T + 



+V AE G+ 



AD+ 



NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 266 

+G +R R+V + DSW S GI++ D+ T + AG +YQ+ + 

TEDGKIRARIVGGYQNNDSWLDRYNSEKTFFSGIVDADLGDLTTLSAGYEYQRIDVNSPT 274 

P LS YAVY DSQG YAT AFG PKDN P ATN W SN S RN RALNL FAG I EHR FNQ DWK LKAE YDYT RS R 326 
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+++ G + ++ + A +W+ + +F ++ +F W+ ++ 

Sbjct: 275 WGGLPRWNTDGSSNSYDW^STAPDWAYNDKEINKVFMTLKQQFADTWQATLNATHSEVE 334 

Ouerv 327 F — RQPYGVAGVLS IDHSTAA — TDLI PGY WHADPRTHSA- SMSLTGKYRLFG 374 

5 uuery. F + y A V D ++ PG+ W++ R A + G Y LFG 

Sbjct: 335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 394 

Query 375 REHDLI AGING YKYASNKYGER — S 1 1 PNAI PNAYEFSRTGAYPQPS S FAQTI PQYDTRR 432 
R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT _ 

10 Sbjct: 395 RQHNLMFG-GSYSKQNNRYFSSWANIFPDEIGSFYNFN— GNFPQTDWSPQSLAQDDTTH 451 

Query 433 QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 491 

Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

Sbjct: 452 MKSLYAATRVTLADPLHLILGARYTNWRVDT LT YSMEKNHTT PYAGLVFDIND 504 

Query: 4 92 XXXXXXXXXXXFVPQLQKDEHGSYIJCPVTGNNLEADIKGEWLEGRLNASAAVYT^RKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKY1APITGNNYELGLKSDWMKSRLTTTLAIFRIEQDNV 564 

20 Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWE IE VGGRITPEWQI QAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + +GEE+GIT WQ+ 6H D +G+ +N 

Sbjct: 565 AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 
25 ~ P ++P + K+FT+Y LP P T+G GV Q +TD P RA 

Sbjct: 625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

Query: 669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 
Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

30 Sbjct: 673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 729 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
1 5 A shows the results of affinity purification of the His-fiision protein, and Figure 15B shows the 
results of expression of the GST-fusion in Rcoli. Purified His-fiision protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 

40 Example 80 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG. . 

50 This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

5 151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTT? TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

351 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAATACCAA 

10 401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGT AATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 

551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 

601 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 

15 651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 

70*» CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 

751 AT ATT GAT GG AGCTGCACAC AATATCGGTA GTCTTCATCG CTTCGGGAAT 

801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 

851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 

20 901 AAAGTTTGCG CCACGCTGAC GTAA 

This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 

1 MRTAWLLLI MPMAASSA MM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 
25 151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PARVLP 
251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 
301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 
30 Homology with a predicted ORF from N. mening itidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of At 
meningitidis: 

10 20 30 40 50 60 

orf 24a pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

35 r | | | | 1 | | I f 1 I 1 I I I 1 1 I I 1 I 1 I I 1 I I I I 1 1 1 I t I t I I I I I I I s I I I I I = I I I I I I I I 1 

orf24 MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 
10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf24a.pep II PSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

mill iiiMiimmiimiMiiiimmHiimiiiiiimmin 

orf24 1 1 PS SSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATES P 

70 80 90 100 110 120 

45 130 140 150 160 170 180 

orf 2 4a . pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

I I 1 I i I I I I I I I 1 I t I I M i 1 1 I I I M M I 1 ! I I 1 I i I I I I I I I M f M lillll 

orf 24 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSKAAFTT 
130 140 150 160 170 180 

50 190 200 210 220 230 240 

orf 24a. pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
I I 1 t I I t t I I I I I I 1 1 I I 1 I I 1 I ltlt:!lMltlll!ltll:lll lit MMIMI 
orf 24 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

55 190 200 210 220 230 240 

250 260 270 280 290 300 

or f 2 4 a . pep S I LI PARVLPILMELHTI S WFI ASGMERXNTSSEGDI PFCTS AEKPPIKDTPMALAALS 
MMMimilMMIMIIIimill HiMiMim:IIMIIIMMM!llt 
AH 0 rf24 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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orf24a.pep KVCATLTX 
I t I 1 I I I 1 
orf24 KVCATLTX 

5 The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 

151 AACGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

10 201 NACGGGGATA AACGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

1U 251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

15 451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

lJ 501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

70 701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

^ 751 ATATTGATGG AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

25 This encodes a protein having amino acid sequence <SEQ ID 678>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 
51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 
151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP'NA 
10 201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 
301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 
ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

-i* 10 20 30 40 50 60 

orf24a pep mrtawlllimpmaassammpemvcagvspgtaiisxpteqtaviasslsnvstpasaaa 
o .pep i i 1 1 i I I t I I 1 1 I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 I f 1 1 I I I I I M 1 1 1 1 1 1 1 1 1 I : I I I I i * I « « 
orf24-l MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf24a.pep HPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

0 p P null I 1 I I I M I 1 1 1 1 1 1 1 M 1 1 1 1 I I 1 1 II II I II I ii n I II I M 1 1 1 1 1 1 mm 

1 1 PS S SETGIN APLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATES P 



40 



orf24-l 



45 70 80 ~ 90 100 110 120 

orf24a.pep 



130 140 150 160 170 180 

TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
1 1 U M M M M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M M II M II I M I II I I II I M I II I Ml 
50 orf24-l TAGVGAS DKSRI PNG I FS I FEASRPMSSPTRVI LKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 



55 ITf t | t I I I I I I t I t 1 I 1 1 1 1 1 1 III I: III Ml MM IMIMI I Ml Mill Ml 

orf24-l pGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

60 orf24a Deo SILIPARVLPII^LHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

50 orf24a.pe P | f | |||tl lllflllllltlllllllltt 1 1 M I M II II I : M 1 1 I II I M M I M II 

orf24-l SILIPARVLPI1WELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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orf24a.peo KVCATLTX 
I I I I I I I I 

5 orf24-l KVCATLTX 

Homolojgy wit h a predicted ORF from N gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N. gonorrhoeae; 

10 orf24 .pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 60 

1 I I I I I I 1 I I 1 I f I 1 I 1 I f I t I 1 i t I I t I I I I ) I - I 1 1 I I I 1 I 1 I I I 1 I I 1 t • K 1 I I t t t 
orf24ng MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 60 

orf 24 .pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 120 

15 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 i 1 1 1 i 1 1 1 i 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 

orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 120 

orf 2 4. pep TG 122 
20 orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 

The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 

1 ATGCGCACGG CGGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GSCGATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATGTCCAA ACCAACGGAG CAGACGGCGG TCATGGCTTC GAGTTTGTCC 

25 151 AGCGTCAACA CGCCTGCCTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCGC TCAAACCGCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCGGGGG TCGGTGCCAG CGACAAATCG AGAATGCCGA 

30 4 01 ACGGGATATT CAGCATTTTT GAGGCTTCGC GACCGATGAG TTCGCCCACG 

4 51 CGGGTGATTT TGAAAGCGGT TTTCTTCACG ACTTCGGCGA CCTCGGTCAG 

501 GCTGACCGCG TCCGAATTTT CCAGCGCGGC TTTGACCACG CCTGGACCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCCGAGCC GTGGAACGCA 

601 CCCGCCATAA ACGGATTGTC TTCCACCGCG TTGCAGAACA CGACGATTTT 

35 651 GGCGCAGCCG AAACCTTCGG GTGTGATTTC AGCCGTGCGT TTGATGGTTT 

701 CGCCTGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTGCTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCGGTA GTTTTCATCG CTTCGGGAAC 

801 GGAACGGATC AACACCTCAT CCGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCGATAAAG GACACGCCGA TGGCTTTGGC TGCCTTGTCC 

40 901 AAAGTCTGCG CCACGCTGAC ATAA 

This encodes a protein having amino acid sequence <SEQ ID 680>: 

1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

45 151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPKALAALS 

301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 

50 10 20 30 40 50 60 

orf 24-1. peo ^TAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 
I M I I M I I I I I I I i I 1 1 ! 1 II I I I I I I I I I I I I : I I I I I I I i I I I M I I I I : I I I I I I I 
orf24ng MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 

10 20 30 40 50 60 

55 

70 80 90 100 110 120 

orf 24-1. pep 1 1 PS SSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

II I II I I I I I I I I II I I I I M I M I I I I I I I I I I I I I M I I N I M I I I I 

orf24ng 1 1 PSSSETGINAPLKPPT ALEAIMPPFFT AS FSNAKAAWPCVPQTLKPI SSRMRATESP 

60 70 80 90 100 110 120 
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130 140 150 160 170 ™ 

TAGVGASDKSRlPNGIFSIFEASRPMSSPTRVlL^ v ^^TSATSVNWASEFSN^FyT 

m 1 1 i! i ii ii 1 1 1 1 1 1 1 J 1 1 *lUiiiiiL;i!IUUii^ 



orf24ng T^SDK^^ 

190 200 210 220 230 240 

orf24-l pep PGPDTPTLITASASPEPXNAPAINGLSSTA^ 

1 1 1 1 1 1 1 1 » ii i n 1 1 ii mi hi .in ! it 1 1 1 1 mi i JLJULLiJLLL Ii il J ^,1111111 

LIT* 
190 

250 260 270 280 290 300 

SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 



1 0 orf 24na PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

AV y - 200 210 220 230 240 



15 | | || |1 1 | l I 11 II M M M M I I 1 I I I M M M M M I M : II II I I I II I I I 11 II I 

y 250 260 270 280 290 



20 orf 24-1. pep KVCATLTX 

I MMI M 

orf24ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 1 8 aa - double- 
25 underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and ^gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 68 1>: 

in 1 ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This coiresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

35 i ..TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 

51 IQYLRGYSID * 

Further woiic revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

40 101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

45 351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAG CATC GTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

50 601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

7 51 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

55 85! AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

5 151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAOADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 
10 Homology with a predicted ORF from N meningitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A of A'. 

meningitidis: 



15 



orf25.pep 
orf25a 



10 20 30 

TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 
MINIUM IMMIMIIIMMMM 
VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKI SNCRQAAAQAD 

250 260 270 280 290 300 



40 50 60 

20 orf 25 . pep RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

f 1 | 1 I I I I I I I 1 I I I 1 I 1 I t t I 1 1 t I 1 1 I 1 I 
orf 25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 

95 1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

30 251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

451 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

40 751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

45 1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIA LPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

50 151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

cc 10 20 30 40 50 60 

orf25a pep hfYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 

P P I I I t I It I 1 1 I I 1 1 1 1 1 1 1 i I I I 1 1 t 1 1 1 I H IIIMIIMIIMMM I] 

orf25-l MYRKLIALPFAlJiAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
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10 20 30 40 50 60 

70 80 90 100 110 120 

VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 

| I I I tllll | | | | | I I I I 1 I I I I I I IIIIIMMlMiUIMIIMIItlll 

VDADKIIAAAYGIAFSLEHASETQEGGRTFCIADLNITVPSETIADAKANSPLLYGETAL 

70 80 90 100 HO 120 



130 140 ISO 160 1^0 180 

10 orf25a neo SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 

IU or£Z5a.pep | | | 1 1 1 1 1 1 1 | | | 1 1 | | | | | I I I 1 1 1 1 I 1 1 1 1 : 1 1 I I I I I M H ' I I I I I I I 1 1 1 1 1 III 

orf?s-i SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 140 150 160 170 180 

15 190 200 210 220 230 240 

orf 25a . pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 
P P | | | M 1 1 1 1 I 1 I 1 1 I I III Mil : 1 1 1 M 1 1 1 I I M 1 I 1111:1 111111111 
or f 2 5- 1 MI DGKAVKKEDAVRI LSGKAREEEPSKPT PEDI LEHN AAGGDAGVPQAAEGAPEPE I LHP 

190 200 210 220 230 240 



20 



250 260 270 280 290 300 

orf 25a oeD DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
* F | 1 1 | | t I 1 f 1 i I I 1 1 1 1 I I I I I 1 I I I 1 I 1 I I I 1 I I i I 1 I I 1 I I I I k t I llllllllltl 

DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 



orf25-l 



25 250 260 270 280 290 300 

310 320 330 339 

orf 25a . pep RQAAAQADRQEYAE YLKLQC DTRMTRERIQY LRGYS I DX 

I I I I I I I I 1 1 I I I I 1 I I 1 I I 1 I I I I t I I t I I I I I I I I I 1 
30 orf 25-1 RQAAAQADRQEYAE YLKLQC DTRMTRER IQY LRGYS I DX 

310 320 330 

Homology with a predicted ORF from N gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
35 N. gonorrhoeae: 

orf25 oeD TDVQKELVGEQRKWAQEKI SN CRQAAAQAD 30 

* P I 1 I I I t !! I I I I ! I I I I t 1 I I I f t I I 1 I 1 I 

orf25ng VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNCRQAAAQAD 308 

40 orf 25. pep RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 

I || | I I I I I I I I I II I I M I I I II I I I n I 
orf25ng RQEYAEYLKLQCDTRMTRERIQYLRGYSID 338 

The complete length ORF25ng nucleotide sequence <SEQ ID 687> is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCAGCGTG 

45 51 CGGCAGGGAA GAACCGCCCA AGGCGTTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAGGACAT ACGCGGCAGT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

50 301 TCTGAAACGC TTGCCGATGC CGAGGCAAAC AGCCCCCTGC TGTATGGGGA 

351 AACGTCTTTG GCAGACATCG TGCAGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGCCAAAGAC 

451 GCTCGGACGG CATTTATCGA CAACACGGTC GGTATGGCGA CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

55 551 GCAAGGCGGT GACAAAAGAA GACGCGGTCA GGGTTTTGAG CGGCAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACCCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCACCCG 

701 AACCCGAAAT CCTGCATCCC GACGACGTCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AACGTGCGGA 

60 801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAGCGC AAGTGGGCGC AGGAAAAAAT CAGcaactgc 

901 cgACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTCCAATGC GACACGCGGA TGACGCGCGA ACggaTACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

65 This encodes a protein having amino acid sequence <SEQ ID 688>: 
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1 M YRKLIALPF ALLIAA EGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLKITVP 

1(T SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

5 201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 

10 20 30 40 50 60 

10 o-f25-l Pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 

* P ^ | 1 1 I I I 1 | 1 | | 1 | t I 1 t t I ! I I 1 1 I I 1 1 1 1 I 1 1 I I I I 1 s I 1 I I 1 1 I 1 1 1 1 1 1 I I I I III 
orf25nq MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 25-1 pep VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

I I | | | I | M I I 1 I I I I 1 I I I I M 1 I i ! I I 1 I M M I M M I I : ! I ! I I I I I M : t 

orf25nq VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAEANSPLLYGETSL 
9 70 80 90 100 110 120 

20 

130 140 150 160 170 180 

orf 25-1 Pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
r 1 1 | : | I 1 I 1 I I I i I 1 1 1 1 1 1 I 1 1 1 I I r | | 1 t t = I I I 1 1 I t : I 1 I 1 1 I 1 1 1 1 1 1 1 1 I I 
orf25na ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 
25 orizong ^ ^ ^ i6Q nQ 18Q 

190 200 210 220 230 240 

orf 25-1 pep MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
1IIIMI I I I i I 1 : ) 1 M I I I M M t 1 I 1 I I M 1 I I I 1 I I I I I I t I 1 I I 1 t 1 3 I ! ! 1 i I 
30 or^25na MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

J 9 190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25-1 pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

35 ' it mi ii mi ii M iMiimi n ii m ii ii mmi i mimiiiimi 

orf25na ddVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 
* 250 260 270 280 290 300 

310 320 330 339 

40 or f 25-1 . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

I | | | | | I | | I I II I I I I I I I I I I I I I I I I I I I i I M H I 
orf25ng RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
50 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
55 that it is a useful immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25- 



10 



Example 82 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 689> 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

15 1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

HOI CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

70 1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 

25 This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 

1 MOLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 
51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN . . . 

251 TSLV 

30 301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 

401 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

501 KK.. 

35 Further work revealed the complete nucleotide sequence <SEQ ID 691>: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

40 201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

45 45i CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

50 701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

751 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

55 95i GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

60 1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCG7TG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

5 This corresponds to the amino acid sequence <SEQ ED 692; ORF26- 1 >: 

1 MQLIDYSHSP FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLVT YK ITEYTPMGTF 

10 201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

3C1 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKS MFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SWGTFGIMLP 

4 01 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

15 451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAV LI FL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical transmembrane protein H11S86 of Rinfluenzae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
20 N-terminus and C-terminus, respectively: 

0rf26 1 MQL IDYSHSFFSWPP FLALALA V I TRR VXXXXXXXXXXX V AFLVG G N P V DG LTH LK DMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELI DFSSSVWS I VPALLAI I LAI ATRRVLVSLS AGI I IGSLMLSDWQIGS AFN YLVKNV 73 

25 Orf26 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI 1586 7 4 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 



30 



// 

Orf26 86 IFTSLLTYSGS — NTSLVFGGTCGVFAWLCTL — GTIKTADY PKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVG^KSMSGAIAI 358 



35 0rf26 142 XXXXXXXSTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

0rf26 202 IAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXX 261 
40 IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 

HI 1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

Orf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
45 HI1586 479 ATVATATS IGY I WG FT Y SG LAG FAAT AVS L I V 1 1 FAVKKR 519 

Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of N. 
meningitidis: 

50 10 20 30 40 50 60 

or f 2 6 . pep MQLIPYSHSFFSWPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 
I I M I I I I ! I I I I t II I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I i ! I II I I 
or f 2 6a MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVQGLTHLKDMV 
10 20 30 40 50 60 

55 

70 80 90 99 

or f 2 6 . pep VGLAWSDXDWSLGKP KILVFXILLGIFTSLLTY SGSNXX 

I MUM lllllll! Ill II | | M I M II I I I I 1 
orf26a VGLAWSDGDWSLGKPK XLVFLILLGIFTSLLTY SGSNQAFADWAKRHIKN RRGAKMLTAC 
60 70 80 90 100 110 120 
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or f 26. pep 
orf26a 



LVFVTFID DYraSlAVCa^ 
160 170 1HU 



130 



140 



10 



15 



20 



25 



30 



35 



40 



orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 



T UU;U,VT YKITEYTrMCT^ 

— 210 220 230 



190 



200 



100 



110 



-TSLV 
I I I I 



MDETAVSDGSWGRVYALI^ 



250 



260 



270 



260 



300 



160 



170 



120 130 140 150 

FGGTCGVFAWLCTL GTIKTADYPKAWQGAKSM^^ 

330 340 350 360 



310 



320 



220 



230 



180 190 200 210 

qtt.v^NTHPGFLP VILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV EP ALIIPCMSA 

I i I I i i I I I mi 1 1 | | I | I t I I I I 1 1 I I I M II I I I I I I I I I M I M : I : I I I I M I I 

STLVAGNIHPG j^ 

— 280 390 400 410 420 



370 



280 



290 



240 250 260 270 

MirLCJl'/CCDIlC? FT c ir > T n<TT gg ^^ ap ^ MHTr>H ^TSQLPYALTVAAAAASGYLALGIiTKSA 

rTTTTTTT i 1 11 1 1 1 1 1 1 1 H 1 i 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II 1 1 1 Ml 

V^(^VCGDHCSPIS 

— — — 450 460 470 480 



430 



440 



300 310 
or f 2 6. pep LLGFGTTGIVLAVLIFL LKDKK 
U 111:11 HUM HUM! I 
orf2 6a LLG FGXTGI VLAVLI FL LKDKKRANAX 

490 ~ 500 



The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 



45 



50 



55 



60 



65 



70 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGTATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GACTGCTCGT 
GTCGCCATGA 
GTTCGTCGTC 
AACAAGCCGC 
AGCTGGGGCA 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAATC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCC 
TTTCCGACAC 



TCGACTATTC 
GCACTTGCCG 
TCTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGNTTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GGGTTTACGC 
TCCGCCATGA 
GGGTGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACAGG 
GGCTTCCTGN 
CACAGGCACA 
CCATGGCGGT 
GTGATGGCGG 
GACCATCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAANT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATCACCGAAT 
CTATTACGCA 
CCTTCGACAT 
GCCCACGATG 
ATTGATTATT 
TCTACACCGG 
GAAAATACGG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTTG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAT 
GGGCGGTATG 
TCGTCCACCG 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGCGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCGCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGATG 
AAACTGCCGT 
CCCGTTTTGG 
TGCACAGGCA 
ACGTGAACAC 
GTCCTCTGCA 
TTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
TTTCCTGCTC 
CGTTCGGCAT 
CCCTCACTGA 
CGGCGACCAC 
GCGCGCGCTG 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGTC 
TAAAGTTTCC 
CTATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
GCACGTTTCG 
TTCAGACGGC 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
CATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 



B*SOOClt>. <WO 99e*57QA2_l_> 



WO 99/24578 



-397. 



PCT/IB98/0I665 



1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

5 This encodes a protein having amino acid sequence <SEQ ID 694>: 



1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWfiDGDW SLGKP KXLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPKGTF 

10 201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV D P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

15 4 51 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 



20 



10 20 30 40 50 60 

or f 2 6a. pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
I I M t I I M I I I I I I I I 1 I ! t I I I I I I I I I II II 1 1 1 I i I I 1 1 1 I II i I I 1 1 I I I I M I I 
orf26-l MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 



25 



30 



35 



40 



45 



70 80 90 100 110 120 

or f 26a . pep VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I 1 ! I t I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I II I I 1 
orf26-l VGLAWSDGDWSLGKPKILVFL I LLGIFTSLLTYSGSNQAFADWAKRHIKN RRGAKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf26a.pep LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMPVSSWGASIIA 
I I I I i I I I I I I I I I I II I I I I I I i I I I I I I : I i I I I I I I I I I I I I I I I I I I I I I I I i I I 
or f 26-1 LVFVTFTDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf26a.pep TLAGLLVTYKITEYT PMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

I I i I I II II I I 1 I I I I I I II I I II I I I I II I I I I I I I I I I I I I I I II I I I I I I ! I I I I I I 
orf26-l TLAGLLVTYKITEYT PMGT FVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf26a.pep AHDETAVS DGSWGHVYALI I PVLALI ASTVSAMI YTGAQASET FS I LGAFENTDVNTSLV 

I I I I I 1 1 I I : : I I I I II I I I I I 1 I I I I I I 1 1 i 1 1 I 1 1 1 1 I II 1 1 I I I I 1 1 I t I 1 1 1 1 1 I 
orf26-l AHDETAVS DATKGRVYALI I PVLALIASTVSAMIYTGAQASETFS I LGAFENTDVNTSLV 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 26a . pep FGGTCGVLAWLCTLGTIKIADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
50 | | I | I I i I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 

orf 26-1 FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEfdHTGDYL 

310 320 330 340 350 360 



370 380 390 400 410 420 

55 orf 26a . pep ST LV AGN I HPGFLXVILFLLASVMAFATGT SWGTFGIMLP IAAAMAVKVD PSLIIPCMSA 

I I I I I I I I I 1 I I I I I I I I I I I I I I I I I M I I I I I I I I I I I i I I II I I I : I : I I i I I I I I 
orf 26-1 STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
370 380 390 400 410 420 



60 



65 



430 440 450 460 470 480 

orf 26a . pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I I I I I I I I I I I I I I II I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 26-1 VMAGAVCGDHCS PI S DTT I LSSTGARCNHI DHVTSQLP YALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 

490 500 
orf 2 6a . pep LLG FGXTG I VLAV L I FLLKDKKRAN AX 
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orf26-l 



1 1 1 1 I : I 1 I 1 1 t 1 1 I I I 1 t i t I I I I t > 
LLG FGTTG I VLAVLI FLLKDKKRAN AX 
490 500 



5 Homology with a EEg djctgd QBE <™"» K ^nnrrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 



respectively, with a predicted ORF (ORF26ng) from K gonorrhoeae: 



60 
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30 



orf26.pep 
orf 26ng 
orf26. pep 
orf 26ng 

orf 26. pep 
orf26ng 
orf 2 6. pep 
orf26ng 
orf 26. pep 
orf26ng 
orf 26. pep 



MQLIDYSHSFFSWPPFIAIAI^ 

V7T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 



97 



VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 



TSLVFGGTCGVFAWLCTLGTIKTADYPKA 326 

11111111111:1 11111:11111111111 

ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGV1AWLCTFGTIKTADYPKA 326 

VWQGAKSMFGAIAILIIAWLISTWGEMHTGDYLSTLVA^IHPGFLPVILF^S 386 

W^A^MrcAIAILILAWLI STWGEMHTGDYLSTLVAGNIHPGFLPVILFLIASVMAF 386 

ATGTSWGTFGIMLPIAAAMAVKVEPALI I PCMSAVMAGAVCGDHCSPI SDTTILSSTGAR 4 4 6 

ATGTSWCTFGIMLPIAAA^ < * 6 



ATbi SWu 1 tunw inwvm.^ . w- . 

CNHIDHVTSQLPYALTVAAAAASGYIAU3LTKSAL1£TOTTGIVIAVLIFL1^DKK 
Ml I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 1 1 I I I I I I I I I I I I 1 1 I I H 



orf 26ng CNH I dm vt sui-r iai.iv ,w™w« * 

The complete length ORF26ng nucleotide sequence <SEQ ID 695> is 
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506 
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40 



45 



50 



55 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGCATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GATTGCTCGT 
GTCGCCATGA 
ATTCGTCGTC 
AACAGGCTGC 
ACCAAAGGTC 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAACC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCA 
TCTCCGACAC 
GACCACGTTA 
CGCATCGGGC 
TTGGCACGAC 
AAAAAACGCG 



TTGACTATTC 
GCACTTGCCG 
TTTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
CACTTCACTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGATTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GTGTTTACGC 
TCCGCCATGA 
GGGGGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACGGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 
CCTCGCAACT 
TACCTCGCAT 
CGGTATTGTA 
CCGACGTTTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATTACCGAAT 
CTATTACGCG 
CCTTCGACAT 
gcccaggacg 
ATTGATTATT 
TCTACACCGG 
GAAAATACCG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 
GCCTTATGCC 
TGGGTCTGAC 
TTGGCGGTGC 
A 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGTGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCTCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGAtg 
aaaccgccgc 
CCCGTTTTGG 
CGCGCAGGCA 
ACGTAAACAC 
GTCCTCTGCA 
GTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 
CTGACGGTTG 
AAAATCCGCG 
TGATTTTTCT 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGGCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGCC 
TAAAGTTTCC 
CCATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
gCGCGTTTCG 
tTCAGACgCT 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGTTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTAtCCcGTG 
TGTTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGCT 
GTTGAAAGAT 



65 This encodes a 



protein having amino acid sequence <SEQ ID 696>: 
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x MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILVG VAFLV GGNPV 

51 ]v:t.THLKDMV Vr,T.mtnr.nu 3LGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTFG TIKT ADYPKAVWQG AKS MFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKVE P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 

ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 

10 20 30 40 50 60 

orf26-l pep MQL I DYSH S FFS WPP F1ALALAV ITRRVLLS LG I G I LVGVAFLVGGN FVDGLTHLKDMV 
| | I I 1 f t I I 1 1 I I I 1 I I I I I I 1 I S I 1 I t 1 I 1 I I I I i I t I I t t I 1 I I t I I 1 ) 1 I I I 1 1 I I I 
o-f26ng MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 100 110 120 

cr'26-1 pep VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I Mlhlll til Nil 111 I I IM I III II MIIH I Mi 111 I til I III IIIIMIl 
or^26na VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 
g 70 80 90 100 110 120 

130 140 150 160 170 180 

orf26-l pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 
| | 1 f 1 I I 1 I I I 1 1 I I 1 1 1 1 I t I t 1 MM 111:111 millltlMIIMIIIIIIII II 

orf26ng lvfvtfiddyfhslavgaiarpvtdkfkvsraklayildstaspmcvlmpvsswgasiia 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f26- ' pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 
~ MmillMlilMIIIIIIIMiillllliliimiMMIIimiMIIIIIIII 
orf26nq TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf26-l oep AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
| : | I I I : I 1 I 1 I I I I I I I I I I 1 I 1 I 1 I 1 I I I I I I I M I 1! I 1 M I I I I 1 1 1 I I I I I M I 1 
o-f26na AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFEKTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf26-l pep FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

MMIIMMIilhltlMlllllMlilllllMlliM Illllllllll 

orf26ng FGGTCGVLAWLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf26-l pep ST LVAGN I H PG FLPV I LFLLASVMAFATGT SWGT FG I MLP IAAAMAVKVE PALI I PCMSA 

1 1 1 1 1 1 i 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALII PCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

o-f26-l pep VMAGAVCG DHCS P I S DTT I L S S TGARCN H I DHVT SQL P Y ALT VAAAAAS G Y LALG LTKS A 
| | | | | 1 | | | 11 I | | I I I I I I I i I I I I II I II II I I II I M II I 1 I I I M I I I I 1 ! I I I I I 
orf26no VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
430 440 450 460 47C 480 

490 500 
or f 2 6-1 .pep LLGFGTTGIVLAVLI FLLKDKKRANAX 

I 1 I 1 I I 1 I 1 I t I I I I I t I 1 I I I I i i = 
orf26ng LLGFGTTGIVLAVLIFLLKDKKRADVX 
4 90 500 

In addition, ORF26 ng shows significant homology to a hypothetical HAnfluenzae protein: 



BNSOOCID: <WO 9924578A2 J_ > 



PCT/IB98/01665 

WO 99/24578 

-400- 

SPIP44263IYF86.HAEIN HYPOTHETICAL PROTEIN HX1586 >gi 1 1074850 Ipir I IC64037 

pZ^S&%*- Haemophilus influenzae (strain Rd KW20) ^il^JJS^J- 
influenzae predicted coding region HI1586 [Haemophilus influenzae) Length 519 

Identities 3 ! <«%). « ^ 7/507 

Query: 1 M0 LIDYSHSFFSWPPFIALMAVITRRXXXXXXXXXXXXXAFLVGGN 60 

Sbjct: 14 SKS^ 

Query: 61 VGLAWADGDWSLGKPKILVFLILIXSIFTSLLTYSGSNQAFAD^ 120 

Query, bi ^ + i ++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 

Sbjct: 7 4 VSLWADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKL^ 132 

Query: 121 LVFVT FI DDYFHSLAVGAIARPVTDKFKVSRAKLAY ILDSTAS ^ 180 
Query. LVFVT FI DDYFHSLAVGAIARPVT D+FKVSRAKLAY I LDST A+PMCV+MPVS SWGA II 

Sbjct: 133 LVFVTFIDDYFHSIAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 192 

Ouerv 181 TIAGLLVTYKITEYTPMGTFVAMSI^YYALFALIMVFVVAWFSFDIGSMARFEQAALNE 240 

V * + GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 

Sbjct: 193 LIGGL1JVTYSITEYTPIGAFVAMSSMNFYAIFSIIWFFVAYFSFDIASMVRHEKLALKN 252 

Ouerv 241 AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQA SETFSILGAFENTDVN 296 

+D4 TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 

Sbjct: 253 TE DQLEEETGTKGQVRNLI LPI LVLI I ATVSMMI YTGAEALAADGKVFSVLGT FENTWG 312 

Ouerv 297 TSLVFGGTCGVL — AVVLCTFGTIKTADYPKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 354 

y ' TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 

Sbjct: 313 TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 372 

Ouerv 355 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 414 
Query. ^ YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 

Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 474 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 492 

Query: 47 5 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 4 93 GFTYSGLAGFAATAVSLIVIIFAVKKR 519 

Based on this analysis, it is predicted that these proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 83 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 697>: 

1 . AAGCAATGGT ATGCCGACGN • AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 " CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 

1 . . KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 
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201 
251 
301 
351 
401 
451 
501 
551 



RTATTCTGAA CCTTATATCG TTGCTTCAAC GCMATCAAA TCTTTTGTGC 
CTACCCTGCA AAACGCTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 
AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG ^TGGGTCAA 
CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 
TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 
GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 
TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 
CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 
601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 
10 e"l TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

MKKLSRIVFS TVt-t.GFSAAL PAQT YSVYFN QNGKLTATMS SAAYIRQYSV 
51 VAGIAHA QDF YYPSMKKYSE PYIV ASTQIK SFVPTLQNGM HLWMtTO 
ic loi KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

1J 15 l EIQFKONKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 
Hnmnln pv with a predicted QRF fr om N.meninritidis (strain A) 
20 ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 



or f 27. pep 



X0 20 30 

KQWYADXSIKTEMVMVNDEPAKILTWDESG 
llllll : | I I I i I I I I M 1 I I t ! I 1 1 1 M 



9C , 27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 

" 140 150 160 170 180 190 

40 50 60 70 80 

Q rf27 dpd rlLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 
orf27. P ep , | 1 1 | | [ | i | | | M 1 I I lHllllMIM.il Ml MM I 

3U orf2 -, a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 

200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701 > is: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

« 51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

^ 10 i AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

AT) 301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

AS 551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

^ em ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

•701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 

SO 1 mtoct^RIVF S TVLLGFSAAL PAQXYSVYFN Q NGKLTATXS SAAYIRQYSV 

° U 51 AEGIAHAQXF XYPSMKKYSE PYI VASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFS KGK PDGEWVNWYP NGKKSAVMPY KNGLSSGTGX RYYRNGGKES 

151 EIQFKONKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

55 ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 

10 20 30 40 50 60 

or f 27a . pep mkKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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in in imii minium hi 1 1 Minimi immmimi* i i 

Orf27-l MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 HO 120 

orf27a oeo XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

* p ii mm mi mi i hi in M m n 1 1 miimimimimi 

orf27-l YYPSMKKYSEPYIVASTQIKSFVPTI^NGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 HO 120 

130 140 150 160 170 180 

orf27a pep NGKKSAVMPYKNGLSEGTGXRYYRKGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 

mmmmimm 1 1 m 1 1 1 1 1 m i m m i m m 1 1 m : 1 1 1 1 1 1 1 1 1 

or f 27-1 NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf27a pep DEPAKILTWDESGRLLSELSIHHHXRNGVVLEWYEIX3SKKXEAVYQDDKLVRKTQWDXDG 

immmii icii iimiimmii immimimi n 

orf27-l DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf27a.pep YLIEPX 
25 III 111 

orf27-l YLIEPX 

Homology with a predicted ORF from TV Gonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
30 N. gonorrhoeae: 

orf27 pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

III I I I I III III III I III I III Ml I I 
orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVNDEPAKILTWOESG 193 

35 orf 27 .pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

I M M I M M M M I M M II M M I I II 1111111111111111111111 
orf27ng RLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 24 5 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 

1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

40 51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

45 301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

50 551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 

55 1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRKGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

60 ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1 . pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSV^/AGIAHAQDF 



BNSOOCID: «WO_992457BA2J_> 



10 



WO 99/24578 PCT/IB98/01665 

-403- 

Elllllllll Itlllll I llMllllM|l|||t|||||tllMIM!l:l!MMIII 
orf27ng MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 

10 20 30 4 0 50 60 

70 80 90 100 HO 120 

or f 27-1 pep YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFWGQKKMAGGFSKGKPDGEWVNWYP 
I I ! I I M I I t I I | I 1 I I I M I M I I I I I I M I i I I i II 1 I I N i I I I M I I I I i I t II I I 
orf27ng YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 60 90 100 HO 120 



130 140 150 160 170 180 

orf 27-1 . pep NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 k i 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 i f 1 1 1 1 1 1 1 1 i 1 1 1 

orf27ng NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 
15 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 27-1 . pep DEPAKILTWDSSGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
I I I I 11 I I II I I I { t I I M I I I I I : 1 II I I I 1 I I I I I I i I i i 1 I I I I I I t i I I I I I I I I I 
20 orf27ng DEPAKILTWDESGRLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 

orf 27-1. pep YLIEPX 
25 I M I I I 

orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

451 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 hvqlhngnlg gllsglqsgl vm 
Further work revealed the complete nucleotide sequence <SEQ ID 707>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

5 101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

1A 351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

451 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

15 601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

20 851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

25 HOI GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGL W IAFLLTAVA T WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 

10 101 WGASASGILG TLFFWYGAVC MAL PVIRSQN ORNYVAVFAL FVLGGTHAAF 

^ 151 uu^THwrruTr. ftT.T.SGLOS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKWVAOA SLW LPMLTAMLMA HGVLAW LSAV FAFAAGVI FT VQV YRWWYKP 

251 VT,KKPMLW lirFAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVLT 

301 T^M"* 0 ™^ HT^NPTYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

35 351 SIRTSSVLFA LALLVYAW KY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted QRF from M meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of N. 
40 meningitidis: 

10 20 30 40 50 60 

orf47 Pep MKFTPCHPVWAMAFRPFYSIAALYGALSVLLWGFGYTGTHXLSGFYWHAHEMIWGYAGLVV 

orf47a MK^KHPVWAMAn*PFYSlJ^LYGALS^^ 
45 10 20 30 40 50 60 

70 80 90 100 110 120 

orf47 oeo t &TTT.7 .T&v&TWTflOPPTRGG VLVGLT I FWLAAR IAAFI PGWGAS ASGI LGTLFFWYGAVC 
lltlllll l I I I I I I Ml I I MIMIM II II I II IMI 1 I I I II II 1 I I I I I I I I I I I t 
50 orf47a I AFLLTAVA TWTGQPPTRGG VLVGLT I FWLAARIAAFI PGWGASASG I LGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 

orf47 oeD MALPVIRSONORN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQSGLVM 

55 mini iii 1 1 linn iiiiiimi ti i ii ii NiMM mini ii ii 

orf47a MAT.PVIRSONQRKYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

orf47a GTRI ISFFTSKRLNVPQI PS PKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVI FT 
50 ^ 190 200 210 220 230 240 



BNSOOCta <WO 992457BA2J_> 



WO 99/24578 



-405- 



PCT/IB98/01665 



The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 

! ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

!0i GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

5 151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

in 401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

1U 451 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

15 651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTO 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

Oft 901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

ZU 951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

25 1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW I AFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFIPG 

!0l MRASASBT LB TLFFWYGAVC MAL PVIRSQN ORNYVAVFAL FVLGGTHAAF 

-in 151 B"n'»"™"''- BT.t-SCiT^S GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

JU 201 pkSv aoaslw lpmltamlma hgvmpwi-saa FAFAAGVIFT vqvyrwwykp 

251 VT.KF.PW.W TL FAGYLFTGLG LIAVG ASYFK PAFLNLGVHL IGVGGIGVLT 

301 T TtlHFiPTM ^ "t^ptvppp KAVPVAFWLM MAATAVRMVA VFSSGTAYTH 

351 SIRTSSVLFA LALLVYAW KY IPWLIRPRSD GRPG* 

35 ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 47a .peo MKFTKHPWA^FRPFYSIAALYGAISVLLWGFGYTGTHELSGFYWHAH^ 

orf 47-l ^„KHPVWAMAFRPFYSIAALYGALSVLLWGFGYTGTHELSGFYWHA^IWGYAGLW 
40 * ' 10 20 30 40 50 bU 



orf47a.pep 
45 orf47-l 



70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

iii || ! mi i in 1 1 ii ii ii i in inn ii ii inn i ii ii ii i ii 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASA^ 

70 80 90 100 110 120 



130 140 150 ISO 170 180 

orf47a oeo maLPVIRSQKQRKYVAVFALFVLGGTKAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

orf47a.pep 7^1 III Ml III I 1 1 1 1 1 1 1 I I I I I I I I I I I 1 1 I I I 1 1 I I 1 1 1 1 I I I I I I I I I I I 

50 orf47 .! ^[^iRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

55 orf 47a. pep GTRIISFFTSKRLNVPQIPSPKWAQASLWLPMLTAMLMAHGWPW 

1 1 | 1 1 1 1 | | | | 1 1 | I 1 1 I 1 1 1 I I I I I I I I I I I I 1 1 I I I I I I 1 1 : llll:IIMIIIIII 
orf47-l GTRI I S FFTSKRLNVPQI PS PKWVAQASLWLPMLTAMU1AHGVIAWLSAVFAFAAGVI FT 

190 200 210 220 230 240 

fin 250 260 270 280 290 300 

orf47a oep vOVYRWWYKPVLKEPtaBILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
orf 47a. pep T 1 1 1 1 1 I I III 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 • M ' » " ' • ' 1 1 1 1 1 1 1 1 1 1 1 1,1,1 
orf47-l V QVYRWVmCPVIj<EPMLWILFAGYXFTGU3LIAVGASYFKPAFXNI^^IGVGGIGVLT 
250 260 270 280 290 300 



65 



310 320 330 340 350 360 
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orf 47a . pep LGMMARTALGHTGNPIYPPPKAVPVAFWI21MMTAVRMVAVFSSGTAYTHSIRTSSVLFA 
llllitltllllltlllltlllililtllllMliiiiiiillMIIIIMllllllltl 
orf 47-1 LGMMARTAIX3HTGNPIYPPPKAVPVAFWLMMAATAVRIWAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf 47a .pep LALLVYAWKYI PWLIRPRSDGRPGX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 

orf 47-1 LALLVYAWKYI PWLIRPRSDGRPGX 

370 380 

Homology with a predicted ORF from N gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 



N. gonorrhoeae: 



15 



20 



25 



ORF47 



ORF47ng 

ORF47 

ORF47ng 



ORF47 



MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
I 1 I I H I I I I I I I i I I I H I i I III I I I I I M I I I t I I I I I i I I I I M I ! I M I I I I I I I 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 



60 



60 
120 



IAFLLTAVATWTGQPPTRGGVLVGLTI FWLAARI AAFI PGWGAS ASGI LGTLFFWYGAVC 

I I I I I I I I I I I I M I I I I I t I I I I I I I I II I I I I I I I I II I I : II I I I I I II I I I I M I 

I AFLLT AVATWTGQPPTRGGVLVGLTAFWLAAR I AAFI PGWGAAASG I LGT LFFW YGAVC 120 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 

II I I I I I I I I : I II I I I I I : I I II I I I I I I I II I I I I II II I II II I 1 I II I 
MALPVIRSQNRRNYVAVFAI FVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 180 



30 



35 



ORF47ng 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RR NYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII 5FFT5 KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMP WLSAA FPFAAGVIFT VQV YAGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an He/ Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 

TM segments in 0RF47ng 



40 



INTEGRAL 


Likelihood 


TS 


-5.63 


Transmembrane 


52 - 


68 


INTEGRAL 


Likelihood 




-3.88 


Transmembrane 


169 - 


185 


INTEGRAL 


Likelihood 




-3.08 


Transmembrane 


82 - 


98 


INTEGRAL 


Likelihood 




-1.91 


Transmembrane 


134 - 


150 


INTEGRAL 


Likelihood 


cs 


-1.44 


Transmembrane 


107 - 


123 


INTEGRAL 


Likelihood 




-1.38 . 


Transmembrane 


227 - 


243 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGAAATTTA 
TTCACTGGCG 
GCTACACGGG 
ATGATTTGGG 
CGTCGCCACT 
GCTTGACCGC 
TGGGGTGCGG 
CGCGGTGTGC 
ATGtcgCCGT 
CACGtccAgc 
GTCGGGCCTG 
TTATTTCGTT 
CCGAAATGGG 



CCAAACATCC 
GCACTGTACG 
AACGCACGAG 
GTTATGCCGG 
TGGACGGGAC 
CTTTTGGCTG 
CGGCAAGCGG 
ATGGCTTTGC 
ATTCGCAATA 
tGCACAACGG 
GTTATGGTGT 
TTTTACGTCC 
TGGCGCAGGC 



CGTCTGGGCA 
GCGCATTGTC 
CTGTCCGGTT 
TCTCGTCGTC 
AGCCGCCCAC 
GCTGCGCGGA 
CATACTCGGT 
CCGTTATCCG 
TTTGTGCTGG 
CAACCTAGGC 
CGGGCTTTAT 
AAACGGTTGA 
TTCGCTGTGG 



ATGGCGTTCC 
CGTATTGCTG 
TCTATTGGCA 
ATCGCCTTCC 
GAGGGGCGGC 
TTGCCGCCTT 
ACGCTGTTTT 
TtcgCAAAAC 
GCGGTACGCA 
GGACTCTTGA 
CGGCCTGATT 
ACGTGCCGCA 
CTACCCATGC 



GCCCGTTTTA 
TGGGGTTTCG 
CGCGCATGAG 
TGCTGACCGC 
GTTCTGGTCG 
TATCCCGGGT 
TCTGGTACGG 
CGGCGCAACT 
TGCGgcgTTC 
GCGGATTGCA 
GGGATGAGGA 
GATTCCCAGT 
TGACCGCCAT 
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651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ACTGATGGCG 
CGGCGGGCGT 
GTATTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATCCGCA 
GTGGAAATAC 
GTTGA 



CACGGCGTGA 
GATTTTTACC 
AACCGATGCT 
CTGATTGCGG 
CGTACATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CGTCTTCGGT 
ATTCCGTGGC 



-407- 

TGCCTTGGCT 
GTACAGGTGT 
GTGGATTCTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTCGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATCCGTCC 



GTCGGCGGCT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTGC 
GCGTTCGGAC 



TTCGCGTTTG 
GTATAAACCC 
ATCTGTTTAC 
CCTGCCTTCC 
CGTGCTGACT 
ATTCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



15 



20 



This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV A? WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRNYVAVFAI FVLGGTHAAF 

151 HVOLHNGNLG GLL5GLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQVY RWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK PA FLMLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAVPVAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA IALLVYAW KY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



orf 47-1 .pep 
orf 47ng-l 

orf 47-1. pep 
orf 47ng-l 

orf 47-1. pep 
orf 47ng-l 

orf 47-1 .pep 
orf 47ng-l 

orf 47-1 .pep 
orf 47ng-l 

orf 47-1. pep 
orf47ng-l 



10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

I I I I II I 1 I I I I I I II I I I II I M I I I I I II IN M I I I Ml I I I I I 1 I I I I I M 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTI FWLAARIAAFI PGWGASASGILGTLFFWYGAVC 

illllllllllltllllilMMIil IMIIIMIlllMliHMIIIIIIIMIIil 
IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 
70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
| | | | | | | i I I ; I I { 1 1 I I I r t I I t 1 t t t t 1 t I t I I I I 1 1 1 I 1 I t 1 I I I 1 I t t 1 I t I I I t I 
MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

GTR 1 1 S FFT S KRLNV PQI PS PKWVAQAS LW L PMLT AMLMAHGV LAW L S AV F AFAAG V I FT 
| MHimillllllllimiilMMIIIIIMimi: It I I :i I I I i I I I I I 
GMRI ISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAILMAHGVMPWLSAAFAFAAGVIFT 

190 200 210 220 230 240 

250 260 270 280 290 300 

VQVY RWW Y K P VLKE PMLW I L FAG Y L FTG LG L I AVGAS Y FKPAF LN LG VHL I GVGG I G V LT 
| || | | | I I I I 1 I I I I I I t i I I I I I I I I I I I Ml I I I I M ! I t II I I M I II I I I M I I I I 
VQVYRWWYKPVLKEPMLW I LFAGYLFTGLGL I AVGASYFKPAFLNLGVKL IGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
IMIIimilltl I I I I II 1 I I M I II I II ! I I I M I M I I I I II I I II I I M I I I I I 
LGMMARTALGHTGN S I YP P PKAVPVAFWLMMAATAVRMVAVFS SGTAYT HS I RTS SVLFA 

310 320 330 340 350 360 



370 380 
orf 47-1. pep LALLVYAWKYIPWLIRPRSDGRPGX 
I I I I I II I I II I M I I I M M I I 1 I 
orf47ng-l LALLVYAWKYIPWLIRPRSDGRPGX 

370 380 



Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 
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gnl!PID|e246540 (Z73914) ORF396 protein [Pseudomonas stutzeri] Length 
Score = 155 bits (389), Expect « 5e-37 



396 



BNSOOCID: <WO 9924S78A2_l_> 



PCT/IB98/01665 

WO 99/24578 

-408- 

Identities - 121/391 (30%), Positives = 169/391 (42%), Gaps - 21/391 (5%) 

Ouerv 7 PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEM I WG YAGLV 59 

^ y * P+W +AFRPF+ +LY L*+ LW +TG GF WH HEM++G+A + 

Sbjct: 14 PIWRIAFRPFFLAGSLYALIAIPLWVAAWTGLWP GFQPTGGWLAWHRHEMLFGFAMAI 71 

Query 60 VIAFLLTAVATWTGQPPTRGGVXVGLTAFWLAARIAAFI PGWGAAASGILGTLFFWYGAV 119 

V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
Sbjct: 72 VAG FT*LTAVQTWTGOTAPSGNRLVGLAAVWIAARL-GWLFGLPAAWLAP LDLLFLVALVW 130 

Query: 120 CMALPVIRSQNRRNYVAVFAim^THAAFXXXXXXXXXXXXXXX^ 1™ 

MA + + +RNY V + ++ G +V+ + L 

Sbjct: 131 MMAQMLW AVRQKRNY P I VWLS LMLGAD VL I LTG LLQGN DALQRQGVLAGLW LVAALMAL 190 

15 Query: 180 IGMRIISFFTSKRLNVPQIPSP-KWVAQASLWLPMLTAILMAHGV MPWLSAAFAFA 234 

IG R+I HTT + L P W+ A L + A+L A GV PL FA 

Sbjct: 191 IGGRVIPFFTQRGI^KVDAV1CPWVWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 
70 GV +++ RW+ K + K +LW L L+ + + +F A 

Sbjct: 250 IGVGHIJ-RLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGIALWHFGIXAQSSPSIJIALSV 309 

Query 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAETfJI^XXXXXXXXXXFSSGTAYTHSIR 353 
M+AR LGHTG + P+AFL FS + 

25 Sbjcf 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FN LGTAARV FLS VAW P VGGLW 365 



10 



30 



Query: 354 TS SVL FALALLVYAWK Y I PWL IRPRS DGR PG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 



Based on this analysis, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 85 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 715>: 

35 i ..ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

40 251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrvArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

401 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

45 501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC 

This corresponds to the amino acid sequence <SEQ ED 716; ORF67>: 

1 .MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

50 101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFWWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
55 N.gonorrhoeae: 
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MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 30 
orf67 ' peP II till II I || I I UN Ml MM 

orf67ng tnfeiavlsgmtvrvfycarpapvnggri^psegsdgigigeseavahaqrg^gfeag 146 

90 100 110 120 130 14U 



orf67.pep 



90 



VFQASPVVVTVSGVXXQU3XDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 

1 1 i I 1 1 1 1 1 : | : 1 1 I I 1 1 : : : : : M 1 II: M I ; : 
orf67ng V FQAS PVWAVAG VQGQAGRD VYAHARHRAEAQAAAAVAFLI GV FLRMS VRI NRN C CVS I 206 

orf 67 .pep XMXXXXSMI^^ 150 

I : I : : : : 1 | I I I I I : M I I M : I I I I ! I I I I II I M I I I I M III 
orf67ng TRVGGKSTCYFFSRIDAVSDVSVGDARTDIGFEFWEFEIVNGGQAERRNGVECAVFLMF 266 

crf67 pep CLGFFW WYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 190 

I || :: I: I: : I : M Mill : M IM 

orf67ng RLLVEWKLVAAKSFIILSFQLFYVHGIFIVVPFPVTGIIRGDAPAAEVVADRHPGVDGM 326 

The ORF67ng nucleotide sequence <SEQ ID 717> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 71 8>: 

1 MFSETVGSIV NVGVDESVGF SPPFPSIQHF YRFHRIHRIR LFRPPGPMQL 
51 NRHSHGSGNL GRGVWATVLS DKFPCGQVRI PACAGMTNFE IAVLSGMTVR 
101 VFYCARPAPV NGGRLKMPSE GSDGIGIGES EAVAHAQRGF VGFEAGVFQA 
151 SPWVAVAGV QGQAGRDVYA HARHRAEAO A AAAVAFLIGV FLRMS V R IN R 
201 NCCVSITRVG GKSTCYFFSR IDAVSDVSVG DARTDIGFEF WEFEIVNGG 
251 mvKnUMZ C AVFLMFRLLV FYVKLV AAKS FIILSFQLFY VHGIFIW PF 
301 PVTGI IRGDA PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 
351 IVGNAFGGVG * 

Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTT TTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

303 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TAT CAT T AT G GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; ORF78>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 
5 - H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 
10 i FDKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 72 1>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 



9924578A2 I > 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 
551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 
601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 
651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

5 This corresponds to the amino acid sequence <SEQ ID 722; ORF78-1 >: 

1 MFAFLEAFFV EYG YAAVFEV T.VTCGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYT.PPTTM DGLAALISVP 

151 tutyt^F.YGA HNIDWLMAKM HSL OSGIFVI LGIGATWAW I WWKKRQRIQ 

10 201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homolo gy with the dedA homologue of K influenzae ( accession number P45280) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

15 Orf 78' 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ 1 CGFGVP I PED+TLV+GGVI +G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 79 

Orf 78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 
20 L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 

DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 

Orf78: 122 VFVT AGI SRKVSYLRFI IMDGLAA 145 
+++ +GI+R+VSY+RF+++D AA 
25 DedA: 140 IYMVSGITRRVSYVRFVLIDFCAA 163 

Homology with a predicted ORF from N. meningit idis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of N. 
meningitidis: 

30 10 2C 30 40 50 60 

orf78 pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
| | I z I 1 I I I I I I I I I I 1 1 1 1 1 1 1 1 I I t I t I I 1 1 1 1 1 1 1 I I 1 I 1 I I I I 1 1 1 1 1 1 I I 1 I t 1 I 
O r f 7 8 a MFALLEAFFVEYG YAAVFFVLVI CGFGVP I PE DLTLVTGGV I SGMG YTNPH IMFAVGMLG 

10 20 30 40 50 60 

^ 70 80 90 100 110 120 

or f 78 . pep VLV GDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNW VLFVARFLPGLRT 
I t I I I I 1 I I I I I I I I 1 I I I Ml MM II MM II II M II MMM I I M I M 
0 ~f78a VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNW VLFVARFLPGLRT 

40 70 80 90 100 110 120 

130 140 
orf 78 .pep AVFV TAGI SRKVSYLR FI IMDGLAA 
M II I I I M I I II II M : I II I I I I 
45 orf 78a AVFVTAGISRKVSYLR FLIMDGLAALI SVPWI YLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 ^40 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 

1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

50 101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

55 351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

5 This encodes a protein having amino acid sequence <SEQ ID 724>: 

1 MFALLEAFFV EYG YARVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQK1L KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSL OSGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 



10 



ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 



15 



20 



25 



10 20 30 40 50 60 

orf78a pep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
i | | r 1 I 1 1 } t I t I t 1 I t I I I t I I I 1 1 I I I 1 I I t 1 I I I S t 1 1 I I 1 I I I 1 I 1 1 1 I I 1 I I 1 1 f 
orf78-l MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGKLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 8a . pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
! | | | I I I I I I I I I It M I I I : I I I I t I ! I I I I I 1 I I I II I I I I I I t I I II I I I I I I 1 I I 
orf 78-1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78a . pep AVFNH'AGISRKVSYLRFLIMIXaLAALlSVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

I M I I I I I I ! ! I 1 ( I I I : t I I ! t i i I t i I i : I M I I ! M I I ! 1 1 I I I I I 1 I I I I I 1 I I : 
orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFV1 

130 140 150 160 170 180 



30 



35 



190 200 210 220 

orf 78a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
||: | : : : I I : I I : I I : : I : I I : : I : ill I : I I I I M I I I I I : : I I 
or f 7 8- 1 LGIGATWAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 



ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
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45 



XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

1 1 I I I 11 I I I I I 1 1 I I I I I I I HI i I M 1 1 
YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 



IIMDGLAA 
: 1 II I I i I 

LIMDGIAALISVPVWIYLGEYGAHNIDW124AKMHSLQSGIFIALGVLAAAIAWFWWRKRR 



145 



92 
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gonorrhoeae: 

orf 7 8. pep 
orf78ng 
orf 7 8. pep 
orf78ng 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 

1 . . YP VLFVARFL PGLRTAVFV T AGISRKVSYL R FLIMDGLAA LISVPWI YL 
51 GEYGAHNIDW LMAKMHSLQ S GIFIALGVLA AALAWFW WRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 
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1 atgtttgccc 

51 GTTTTTCGTT 

101 TGACCTTGGT 

151 CATATTATGT 

201 GATGTTTGCC 

251 CGATTGCGCG 

301 TTCGACAAAT 



tttTggaagc 
TTGGTCATCT 
AACGGGCGGC 
TTGCGGTCGG 
GCCGGACGCA 
CATCATGACG 
ACGGCAACTG 



CTTTTTTGTC 
GCGGTTTCGG 
GTGATTTCGG 
TATGCTCGGC 
TCTGGGGGCA 
CCGAAACGTT 
GGTTCTGTTT 



GAAtacggCt 
CGTGCCGATT 
GTATGGGTTA 
GTGTTGGCGG 
GAAAATCCTC 
ACGCGCAGGT 
GTCGCCCGTT 



atgcGGCCGT 
CCCGAAGATT 
TACCAATCCG 
GCGACGGCGT 
AAGTTCAAAC 
TCAGGAAAAA 
TCCTGCCGGG 
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351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

5 551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

10 51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKP1ARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFVT AGISR KVSYLRFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLO SGIFIA LGVLAAA1AW FWWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-t and ORF78-1 show 88.1% identity in 227 aa overlap: 

15 10 20 30 40 50 60 

orf78-l pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
| | | : t I M II I II I I I I I I I I I I I I II II I I I I I I I I II I I I I I I I I M I II I I II I I I I 
orf78ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
10 20 30 40 50 60 

20 

70 80 90 100 110 120 

orf78-l pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
t | : I I 1 : I I I I I I I I I I I I 1 : I i I I M I I I I i I t I I I I I ! I I I I I I M I I I 1 I M I I I I 
orf78ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGKWVLFVARFLPGLRT 
25 ^~ J 70 80 90 100 110 120 

130 140 150 160 170 180 

O r f 7 8 - 1 . pep AVFVTAGI SRKVSYLRFI IMDGLAALISVPIWI YLGEYGAHN IDWLMAKMHSLQSGI FVI 
I I 1 ! I I I I 1 I I 1 1 I 1 1 1 : 1 M 1 1 1 1 I I I I I = 1 1 1 1 I I 1 1 I t I 1 1 1 1 1 I I I 1 I I I I K 1 1 : 
30 orf78ng-l AVFVTAGI SRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHN I DWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 

190 200 210 220 

orf78-l .peD LGIGATWAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 
35 * | | : | : : : I | : 1 I : I I : : I : I I : : I : I I I I : I I 1 I I I I ( I I I : : M 

or f 7 8 ng - 1 LGVIAAA1JVWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from HAnfluenzae: 

sp|P45280|YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi 1 1073983 I pir I I D64133 dedA 
40 protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 

>gi 1 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae) Length = 212 
Score = 223 bits (563), Expect - 7e-58 

Identities = 108/182 (59%), Positives - 140/182 (76%), Gaps - 2/182 (1%) 

45 Query: 5 LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM— GYTNPHIMFAVGMLGVL 62 

L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 

Sbjct: 21 LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

Query: 63 AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 
50 " AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 

Sbjct: 81 AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 140 

Query: 123 FVTAGISRKVSYLRFLIMDGLAALISVPVWIYI^EYGAHNIDWLMAKMHSLQSGIFIALG 182 
++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I + I +G 
55 Sbjct: 141 YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 200 



60 



Query: 183 VL 184 
L 

Sbjct: 201 YL 202 
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Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from K meningitidis and M gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 87 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

10 201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

1 5 This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 

20 1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

25 25 ^ AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

30 This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVKVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

35 Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitid is (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of//. 
meningitidis: 

40 10 20 30 40 50 60 
orf79 Pep MKKLLAAV MMAG LAGA V S AAGVHVE DGW ARTT VEGMK I GGA FMK I H N DE AKQD FL LGG S S 
H illlHMHIII IHM:MMm MIHHI:MM H I I MIMilMIHM 
or f 7 9a MKXLL AAVMMAGLAGA VSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 
10 20 30 40 50 60 

45 

70 80 90 100 110 120 

orf79 Pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

| 1 I | I 1 1 1 1 I I I ] t t t I 1 I I I I I 1 1 I I 1 I I I I I i 1 I I I 1 t I 1 i I 1 Ml Hill 

or ^7 9a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
5Q 70 80 90 100 110 120 
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130 140 
or f 7 9 . pep VT LK FKN AKAQT VQLE VKI APMPAMNH 
I I I I I I I I I I I I I I 1 I I I HI >>:l 
5 orf 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT . 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

10 101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

15 351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
Of) 51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 

10 1 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

7<5 orf 7 9a oeP MKX LLAAVMMAGLAGAV S AAG I H VE DGW ARTTVEGMKMGG AFMK I HN DEAKQD FL LGG S S 

P P || 1 1 1 1 | | M | | | 1 1 1 I II I : I 1 1 I I 1 1 1 I 1 1 H 1 1 : I I I I I I I II I I I I I M I I I I I I 
orf 7 9-1 MKKLIAAVMMAGLAGAVS AAGVHVEDGWARTTVEQflCI GGAFMK I HN DEAKQDFLLGGS S 

10 20 30 40 50 60 

™ 70 80 90 100 110 120 

orf 7 9a pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
* P P 1 1 i 1 1 M I 1 1 i I I I M 1 1 1 M t 1 1 M I I I ! M M I I M } M 1 I I I I 1 1 Hill Hill 
orf 79-1 pvADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
70 80 90 100 110 120 

35 

130 140 150 

orf 7 9a pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
MINIUM II li Mil III II:IIIIIII1IIH 
or f 7 9- 1 VTLKFKNAKAQTVQLEVKI APMPAMNHGHHHGEAHQHX 

40 130 140 150 

Homology with a predicted ORF from N gonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N. gonorrhoeae: 

orf 7 9 oeo FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

-**P " | | | | | | | I | H I : I It H H H i H I I II I 

orf79ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf 7 9 pep YHVMFWGlJCKQLKEGDKIPVTLKn<HAKAOTVQlXVKIAPMPAMNH 147 
50 I 1 1 1 1 1 1 1 1 I 1 1 K I 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 I 1 1 III llll 

orf 7 9ng YHVMFMGUCKOLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 735> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 

1 INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
55 51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 737>: 
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1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

5 201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

10 451 CACGGCGAAG CGCATCAGCA CTM 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGKH 

15 151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf79- n pep MKKLLAAVMMAGLAGAVS AAGVHVEDGWARTT VEGMK I GGAFMK I HN DEAKQDFLLGG S S 
I 1 j | M I || 1 I I | 1 I I I I I I I I I I I I I 1 I I I I I I I i I : I II I II I M I I I 111:1111 
20 orf 7 9nq- 1 KKKLIJ^VMMAG1AGAVSAAGVH\^IX^ARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 79-1. pep PVADRVEVHTHINDNGVMRMRE\^GGVPI£AKSVTELKPGSYHVMFMGLKKQI*KEGDKIP 
25 1 | M I 1 I i I I ) I 1 I I I 1 I I I I tl : I I II I I I I I I I ! I I M I M I i I I M I I I I I I I U ! I 

orf 7 9na- 1 PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKI P 
70 80 90 100 110 120 

130 140 150 

30 orf 79-1 . oep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

I I I 1 t I I I I M I i I I I I I II! I I I I I I I ! I 1 ! I I I I 
or f 7 9ng- 1 VTLK FKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 
130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

35 gi | 2983695 (AE000731) putative protein [Aquifex aeolicus] Length « 151 

Score =63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%), Positives - 58/114 (50%), Gaps = 1/114 (0%) 



40 



Querv: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 

" V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 



Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK+ +KEGDK+ V L F+ + TV+ V 
45 Sbjct: 87 ER-IEIPPKGKVEFKHHGYHV>1IIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 

Based on this analysis, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 18A shows 
the results of affinity purification of the His-fusion protein. Purified His-fiision protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
1 8B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in ^meningitidis <SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

5 51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TOCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

10 301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

1U 351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

,r 551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

1 601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

20 1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

25 Further work revealed the complete nucleotide sequence <SEQ ID 741>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

10 201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

IS 451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

40 701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 

1 MTEXAAEGGK &AKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL KFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

45 15 i SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kme nintritidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of N. 
50 meningitidis: 

10 20 30 40 50 60 

orf98 oeo MTVTAAEGGKAAKALKKYLITGILWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
o 9 pep H Tit I 1 1 I I I 1 1 I 1 1 1 1 1 1 f 1 1 1 1 1 1 1 I I i I I I 1 1 1 1 M 1 1 1 1 1 1 1 1 1 i It I I t I I I 
orf98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
55 * * 10 20 30 40 50 60 
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70 



80 



10 



15 



orf98.pep 
orf98a 

orf 98 .pep 
orf98a 

orf 98. pep 
orf 98a 



90 100 110 120 

GFNIPGLGVIVMAVLFVTGLFAAWLGRQII^WDSLLGRIPVVKSlYSSVKKVSEyVL 

| I I 1 1 I I I 1 1 1 M ! H 1 1 1 1 I I i M I I M I I t I I I i i I i | | | | | | i ! | I I ! I i M I : I 
GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPVVKSIYSSVKKVSXSLL 

70 80 90 100 110 120 

130 140 I 50 160 170 180 

S DS SRS FKTPVLVPFPQPGIWTI AFVSGQVSNAVKAALPXDGDYLSVYVPTT PNPTGG YY 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 miiiiiiiMiiiiiiMi iimmmiMiiiiii 

SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

IMVKKS DVRELDMS VDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADLPEQQX 

I 1 I 1 I I I I I I f I I I 1 I 1 I 1 1 I I 1 1 I i t t I 1 I I I I I I I I MIIMII I 

IMVKKS DVRELDMSVDEALKYVI SLGMVI PDDLPVKTLAG PMPSEKADLPEQOX 

190 200 210 220 230 



The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 



20 



25 



30 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGACGGAAC 
ATATCTGATT 
GGGTGGTTTC 
CCGAAGCAAT 
GGGCGTTATC 
CAAACGTATT 
CGGATTCCGG 
NTCGTTGCTG 
CGTTTCCCCA 
TCGAATGCGG 
GTATGTTCCG 
AGAAAAGCGA 
TATGTGATTT 
ATTGGCAGGA 
AA 



CTGCGGCCGA 
ACGGGCATTT 
CTATATCGTT 
GGCGGCCGCA 
GTTGCCATTG 
GGGCCGGCAG 
TTGTGAAGTC 
TCCGACAGCA 
ATCGGGTATT 
TTAAGGCCGC 
ACCACGCCGA 
TGTGCGCGAA 
CGCTGGGTAT 
CCTATGCCGT 



AGGCGGCAAA 
TGGTCTGGCT 
TCCGCGTCCG 
ATATGTTTTG 
CCGTATTGTT 
ATTCTTGCCG 
CATCTATTCG 
GCCGTTCGTT 
TGGACAATCG 
ATTGCCGAAG 
ATCCGACCGG 
CTCGATATGA 
GGTCATCCCT 
CTGAAAAGGC 



GCTGCCAAGG 
GCCGATTGCG 
ATCAGCTCGT 
GGGTTTAATA 
TGTAACCGGA 
CGTGGGACAG 
AGTGTGAAAA 
TAAAACACCA 
CATTCGTGTC 
GACGGCGATT 
CGGTTACTAT 
GCGTGGACGA 
GACGACCTGC 
GGATTTGCCC 



CGTTAAAAAA 
GTAACGGTTT 
CAACCTGCTG 
TCCCGGGGCT 
TTATTTGCCG 
CTTGTTGGGG 
AAGTATCCGA 
GTACTCGTGC 
CGGTCAGGTG 
ATCTTTCCGT 
ATTATGGTAA 
AGCGTTGAAA 
CCGTCAAAAC 
GAACAACAAT 



35 This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPHPTGGYY IMVKKSDVRE LDMSVDEALK 

40 201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 98a pep MTEPAAEGGKAAKAUCKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 

Ml | 1 I I I I I I 1 I ! I I I I I i M I I I I I t I lj II III II MM II III! I II I I I 

orf 98-' MTEXAAEGGKAAKALKICyLITGILWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPOYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 

m 1 1 1 1 1 1 1 1 1 i i n i i n n i i i i i i i i i i i i i i i m i i i i ! i i i i i i n n i i in 

orf 98-1 GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98a peD S DS SRS FKTPVLVPFPQSGI WT IAFVSGQVSNAVKAALPKDGDYLSVYVPTTPN PTGG Y Y 
| M | f } f i 1 I I I M 1 M I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 98-1 S DS SRS FKT PVLV P FPQPG I WT I AFV S GQ V S N A VKAAL PKDG D Y LS V YV PTT PN PTGG Y Y 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98a pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

r * MmiiiiiiniininniiiiiiimMMiiMiiiiiimmii 

or*98-l IMVKKS DVRELDMSVDEALKYVI SLGMVI PDDLPVKT LAG PMPSEKADLPEQQX 

190 200 210 220 230 
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Homology w** » predicted rn?r? fam ^gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORP (ORF98ng) from 



Af. gonorrhoeae: 



XO 20 30 40 50 60 



orf 98 .pep MTVTAAEGGKAAKALmLITGILVWLPIAVTVVWSYI^ 

orf98ng mAmM^^ 

^LF^GLFAANVLGRQIIJ^WDSLl^RIPVVKSIYSS\raWSEyVL 



60 
60 
120 



orf 98 . pep GFNIPGI^IVAIAVLr^GLFAANVLGRQII^wu^.^x. * ----- • 7m 77 7 



orf98ng 



180 



15 orf98ng 

orf 98. pep 



orf 98 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDG^ 
oriyo.pep I i • i i I i I i I 1 I i | I I | J || | | 1 | l| M | I I | | II II I I I I I I I IH I I III I II H l 

sdUU^pv[vpfp^ 180 



IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQ 233 

I I I I I I | I | | | | | | | | | | I I I I I I I I II I I I 11 I I I I I IN 111:11111 
IW^SD^LDMSVDEALKYVISWMVIPDD 233 



orf 98ng 

20 The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 

1 MTEPAAEGGK aavaTVttv rT TKILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKOWRPOYVL fiFMTPR LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLX 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

2 c III SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

•m 101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

iU ill CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

15 351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

40 601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

W 65 I ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK aavaT.gW T.T TGILVWLPIA VTVWW SYIV SASDQLVNLL 

45 51 PKOWRPOYVL r.FNTPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

45 101 RIPVvSlYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 

cr, 10 20 30 40 50 60 

«rfqa-l pen MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 
orf98-l.pep MTEXAAEGGKAAKA ,,,,,,,,,,,,,,,,,,,,,,,,,,,, | | , | | 

Otf98 M -l MTEPAAE^KAAKALKKYLITGILVVJLPIAVT^ 

s 10 20 30 40 50 60 

55 70 80 90 100 110 120 

orf98-l pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPVVKSIYSSVKKVSESLL 
orf98 l.pep <J™» ,,,,,,, 1 1 1 1 I II I 1 1 1 1 1 H 1 1 • 1 1 I »l 1 1 1 1 1 I I M I 1 1 1 1 1 
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r>rf98nq-l GFNIPGLGVIVAIAVLmGLFAANVI^RQIU^WDSLUSRIPVVKSlYSSVKKVSESLL 
70 80 90 100 H° 120 

130 140 150 160 170 180 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

I I 1 1 1 | | | J | | | | | | | | | I I I I i I HI I I i I I I I I I 1 I : I I I M I I I II I 1 I I ! I I I I i 
SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

MIlllllllMIMMIIlMMIIMItt Ilillll 111:111111 

IMVKKSDVRELDMSVDEALKYVI SLGMVI PDDLPVKTLAGPMPPEKAELPEQQX 
190 200 210 220 230 



5 orf 96-1. pep 

orf98ng-l 

10 190 200 _ _210 220_ _230 

orf 98-1. pep " - — — ^, „ 



orf98ng-l 

JL3U iuu 

15 Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 

Example 89 

20 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 749>: 

1 AT g AAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GsGgTACTCA 

25 201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 GaGAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

4 01 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

30 451 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

35 701 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

40 951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC. . . 

45 This corresponds to the amino acid sequence <SEQ ID 750; ORF100>: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AWVWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

50 201 FDRGDALQVL AKTEKLSKAG ALGKS EMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

55 i ATG AAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 
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201 
251 
301 
351 
401 
451 
501 



TATCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 
CCGCGCTTGC CTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 
GAAAAGGCGG AACTAGAAGC CTCACGCGTG TTGGTCAACA AAGAGGCCGG 
AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCC GCCGGACAGA 
TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 
CCGGAAAAAC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 
GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 
551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 
601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG A^EI^C 
10 651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

1U 701 CATACCGCCG CCAGCTGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 
801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 
851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 
, c qni TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

15 \l\ SSSS GATGCTTGGC TGAAAGAACA gcccgataac gcgcttctgc 

1001 TGATGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 
!o5 AaSaS TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 
1101 TTTGGTTCTA GCAAAGGTTT TCGACGAAAT CGGAGAACCG CAGAAGGCGG 
20 1151 AGGCGCAGCG CAACTTGGTT TTGGAAGCCG TCTCCGATGA CGAACGTCAC 

1201 GCAGCGTTAG AGCAGCATAG CTGA 

This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 

1 MKTWWIWL FAA AVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 
51 AWVWYFLFK FIIGVLHIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 
7 c 101 EKAELEASRV LVNKEAGDNR TLA1MLGAHA AGQMENIELR DRYLAEIAKL 

/;> [I, PEKQQLSRYL LLAESALNRR DYEAAEARLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKS EMERY QNWAYRRQLA DAADAAALKT 
251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
30 35i KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 

401 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 
Hnmnlo pv with a predicted ORF from Kme nmytidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A ofN. 
35 meningitidis: 

10 20 30 40 50 60 

or f 100 . pep MCTVWIVVLFAAAVGUUASGIYTGI^ 

pp minim ii 1 1 mi mi i m 1 1 1 1 n inn i n 1 1 1 1 1 n i » i in ii 

orflOOa .SJ^iwLFAAAXGiA^SGIXTGDVYIVLG 

70 80 90 100 110 120 

or f 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXIALNKAGIAYFEGRFEKAELE^ 



t| II I I II I I II I II III I I I t t I I 1 t 1 I I I f I I 1 I I 1 ■ I t I t I I 1 II -■ in 
I I I I I I I 1 I I I I I 1 _ - i — : — - - - LiJijy^QLAYFEGRFEKAELEASRVLGNKEAGDKI 

90 100 110 12< 

130 140 150 160 170 180 

orf 100 . pep TIAI^XAHAAGQMENIXXRDRYIAEIAKLPEK^ 



45 orf i00a FUGV^XPEKMQRrcSARKGR^ 



50 orflOOa TLAlJlLGAH^GCHffiNIELRDRYIAEIAKLPEKQQLSRYLLLAESALN 

E 130 140 150 160 170 180 



190 200 210 220 230 240 

55 orflOO.pep AAAKMNANLTRLVRIOCIRYAFDRGDAI^VLAOTEKLSS^GALGKSE 

" " i urn inn mi : 1 1 1 n n 1 1 1 1 1 n 1 1 1 1 inn iniiiiiiiiimii 

orflOOa AAA^ANLTRLVR^ 

250 260 270 280 290 300 

DAADAAALKTClJCRIPDSLKNGELSVSVAEKYERlfGLYADAVKWVKQHYPXNRRPELLEA 

till 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii mil 

orflOOa DAADAAALCTClJ^ 

250 260 270 280 290 300 



60 



orflOO.pep 
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310 320 330 340 350 360 

orflOO oep FVESVRFI^EREQQKAIDFADAWI^^ 

orflOOa FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYU3RIAYGRKLWGKAKGYIXASIAL 
310 320 330 340 350 360 

370 380 
or f 100. pep KPS I S ARLVLTKV FDE I GE PQKAEAH 
| | | ! I I I I I I : I I I I I I I I I I I I I : 
orflOOa KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 

370 380 390 400 

The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CNNTCGGGCT 

51 GGCATTGGCG TCGGGCATTN ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTCAAA TTCATCATCG GCGTACTCAA 

201 TANCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC TTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTTGAAGC CTCGCGCGTA TTGGGAAACA AAGAGGCGGG 

351 GGATAACCGG ACTTTGGCAT TGATGTTGGG CGCACATGCC GCCGGGCAGA 

401 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

4 51 CCGGAAAAGC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAANTTTC 

651 CAAGGCGGGC GCGTNGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

7C1 CATACCGCCG CCAGCTGNCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GACCCGAACT TTTGGAAGCN 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAA CGCGATCAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAT GCGCTTCTGC 

1001 TGANGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 

1101 TTTGGTTCTG GCAAAGGTTT TTGACGAAAC CGGAGAACCG CAGAAGGCGG 

1151 AGGCGCAGCG CAACTTGGTT TTGGCAAGCG TTGCCGAGGA AAACCGNCCT 

1201 TCCGCCGAAA CCCATTGA 

This encodes a protein having amino acid sequence <SEQ ID 754>: 

1 MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AWVWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAKA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 

ORFlOOa and ORF100-1 show 95.1% identity in 406 aa overlap: 

10 20 30 40 50 60 

orflOOa pep ^HCTVWIWLFAAAXGIJUjASGIXTGDVYIVLGQTM 

t } | | I I I I I llllllll IIIIIIMMMMMIMMMMMMMMMI 

erf 100-1 MKT^A^IVVLFAAAVG1JUASGIYTGDVYI^^GQTMLRINLHAFVI/3SLIA^AA/WYFL^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

orflOOa pep FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
| ( I I I I I 1 I I I 1 I I I I I I 1 1 I I I I 1 I I I I 1 I I I I I I I I I I I 1 1 1 I I I I I I MINIM 
orf 100-1 FIIGVU4IPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orflOOa pep TLAI*!IX5AJiAAGQMENIEIJlDRYLAEIAKLP£KQQLSRYL 

|M I I I M 1 M 1 I I I I I ! 1 I I M I M I I M M II I I I M I I I I I I I I I I I I I I 

orf 100-1 T LALMLG AHAAGQMEN I ELRDR YLAE I AKL PE KQQL SR Y LLIAE S ALN RRD YEAAEAN LH 
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130 



140 



150 



160 
220 



1*70 



180 
240 



10 



15 



20 



25 



orflOOa.pep 
orfl00-l 

orflOOa.pep 
orfl00-l 

orflOOa.pep 
orfl00-l 

orflOOa.pep 
orfl00-l 



190 200 210 220 230 

AAAKMNANLTRLVRLQLRYAH^^ 

llltll II I I I t I I I I M f 1 1 I 1 1 1 Mill I MM MMIIIMIMMMI 

aaa^anlW 

200 210 220 230 240 



190 



300 



250 260 270 280 290 

DAADAAALKT CLKRI P DS LKNGELSVS VAEKYERLGLYADAVKWVKQHY PHNRRPELLEA 
i t M M M I M 1 I I I I H M 1 I II 1 M H M H I II I I It 1 I I I I I t M M H I I I M U 
DAADAAALKTCLKRI PDSLKNGELSVS VAEKYERLGLYADAVKWVKQHY PHNRRPELLEA 
260 270 280 290 300 



250 



360 



310 320 330 340 350 

FVESVRFLGERDQQKAI DFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEAS IAL 

1 1 1 M I I I I I 1 : I MM MMMMMMMMI I II M I M I I M I 11 I Mil MM 
FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 390 400 

KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 

MIMIMMMMM MIMIMIMIM :|::::l Mil 
KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

370 380 390 400 



Homology with a predicted ORF fro m A Gonorrhoeae 

ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 



30 



35 



40 



45 



50 



55 



AT. gonorrhoeae: 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 



60 



MKTVVWIWLFAAAVGLAIJ^GIYTGDVYIVLGQTMLRINIJlAF\fl^ 

IIMIMIMMMMIM M I I M M M I I M I II II I M M M I I I I M II 

MKTVVWIWLFAAAVGUUJ^GIYTGDWIVl^QTMLRINLHAF^GSLIAWWYFLFK 

FI IGVLNI PEKMQRFGSARKGXKXX1ALNKAGIAYFEGRFEKAELEASRVLVNKVGRDNR 
MMIIMII:l:l MM II I M I M I 1 1 I M I II M M I M M II I II : Ml 
FI IGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

TlJJ^LXAHAAGQMENIXXRDRYl^IAKLPEKQQLSRYLLIJ^SALNRRDYEAAEANLH 

IIMH MIMIIMI MIIIMMMIIMMIIIMMIIIIIMMMMMII 
TIAI>ILGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

AAAKMNANLTRLVRLXIRYAFDRGDALQVIAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
M I 1 1 1 I I M I M II : M M I I M I I I 1 1 1 1 M M M I M I M I I M II M I 1 11 I I M 
AAAKMN AN LT RLVRLQLRYAFDRG DALQVLAKTEKL S KAG ALGKS EME RYQN W AYRRQMA 

DAADAAAUCTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPEL1£A 300 

M MMMMMMMMI MM I I I I I III M I I I i I I I I I I I MM M Mill MM 
DAADAAALKTCLKRI PDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 300 



60 



120 



120 
180 



180 



240 



240 



FVESVRFLGEREQQKAlDFADAWLKEQPDNALLLMYLGRIAEtnUCLWGKTVKGYLEASIAL 
MMMIIMIIMM II I M M II 1 1 II 1 1 1 1 II I II I II : I I I M M I M I M 1 1 1 II 
F\^SVRFI^EREQQKAIDFADSWLKEQPDNALLIKfLGRLAYGRKLWGKAKGYLEASIAL 



360 



360 



KPS I SARLVLTKV FDE IGE PQKAEAH 
MM IIIICIIHI - Mill: 

KPS IPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 



386 
405 



The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 



BNSOOCIO. <WO_99e457BA2.l_> 



WO 99/24578 



-423- 



PCT/IB98/01665 



10 



15 



451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



CCGGAAAAAC 
AAACCGGCGC 
AGATGAATGC 
TTCGATCGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTGTTG 
AAGCACAGCG 
TCCGCCGAAA 



AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGATGCGTT 
GCGTTGGGCA 
CCAGATGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATTCTTGGC 
CGGCCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCGTTGA 



CCGCTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
TATTGCACTG 
TTGACGAAAC 
TTGGCAAGCG 



CTGCTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAaccG 
GG AACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGCACAGTCG 
TTGCCGGGGA 



AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCC 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGagcGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGTAAGGCA 
TTCCGGCGCG 
CAAAAAGCCG 
AAACCGCCCT 



This encodes a protein having amino acid sequence <SEQ E) 756>: 



1 MKTWWIWL FAAAVGLALA 



20 



25 



51 AVWWYFLFK 

101 EKAELEASRV 

151 PEKQQLSRYL 

201 FDRGDALQVL 

251 CLKRIPDSLK 

301 FVESVRFLGE 

351 KGYLEASIAL 

401 SAETR* 



FIIGVLNIPS 



LGNKEAGDNR 
LLAESALNRR 
AKTEKLSKAG 
NGELSVSVAE 
REQQKAIDFA 
KPSIPARLVL 



SGIYTGDVYI 
NMRRSGSARK 
TLALMLGAHA 
DYEAAEANLH 
ALGKSEKERY 
KYERLGLYAD 
DSWLKEQPDN 
AKVFDETAQS 



VLGQTMLRIN 
GRKAALALNK 
AGQMENIELR 
AAAKMNANLT 
QNWAYRRQMA 
AVKWVKQHYP 
ALLLMYLGRL 
QKAEAQRNLV 



LHAFVLGSLI 
AGLAYFEGRF 
DRYLAEIAKL 
RLVRLQLRYA 
DAADAAALKT 
HNRRPELLEA 
AYGRKLWGKA 
LASVAGENR? 



ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 



30 



35 



40 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 100-1. pep MKTVWIWLFAAAVG1JUASGIYTGDVYIVLGQTMLRIKLHAFVLGSLIAVVVWYFLFK 

Minn miiiiiiHmiiHMiiiimimiiiitmmiiiif mini 

orflOOnq MKTVWIWLFAAAVG1ALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 100-1 pep FI IGVLN I PEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
* * MM MM II: 1:1 I I II 1 I I I II II M II I I I I M I I M I I 1 I I I M M II HUM 
orflOOnq FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1 00- 1 pep TLALMLGAHAAGQMEN IELRDRYLAE I AKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
Ml I I M M I 1 1 I I I II M I tt I M I I 1 I II 11 M It II I 1 11 1 1 I 1 1 I M M I ! I 1 1 I I 
orflOOnq TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 100- 1 pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQKWAYRRQLA 
I I I I II I M M I I II I I II I I I i I I I I M I M I i M It I 1 I I I II I I II I I I I I I I I I : I 
orflOOnq AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100-1. pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
| It M M I II I I I I II I I I I I I I I I I M I I I I 1 1 M 11 I II I I I t I I I I I I i M I II I I I 
O r f 1 0 Onq DAADAAALKTCLKRI PDSLKNGELS VS VAEKYERLGLYADAVKWVKQH Y PHNRRPELLE A 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100-1. pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
Ml Ml 1 1 I I I I 1 1 1 ! 1 1 1 1 1 1 1 1 M t 1 1 I I ! I 1 1 I 1 1 1 I I M ! I I 1 1 I 1 I 1 1 f I 1 1 1 I I 
o r f 1 0 Ona FVES VRFLGEREQQKAI D FAD S WLKEQ P DN AL LLMY LG RLA YGRK LWGKAKG YLE AS I AL 

310 320 330 340 350 360 

370 380 390 400 

orf 100-1 .pep KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

MM I I M II I 1 I I I : : I I I i I I I I I I I : I : : : » : 1 
or f 1 0 On KPS I PARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPS AE7RX 



BNSDOCID: <VyQ 99gi578A2 I > 



WO 99/24578 



-424- 



PCT/IB98/01665 



370 3B0 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N.meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
5 raising antibodies. 

Example 90 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
757> 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

|0 51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

10 i TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

15 301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMI DVPRGN PEYVRLSGMA 
20 51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 

101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

25 101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

30 351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF 1 02- 1 >: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM I DVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAA GWWGS GWVHVK LCLG LMLLAYQLYC 
35 101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP1484 hypothetical integral membrane protein of K pylori (accession number AE000647) 

ORF102 and HP1484 show 33% aa identity in 143aa overlap: 

O-fl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 
40 " F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK KLYSFIASPAM 65 

orfl02 63 GAWFGAAIPFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L ++LLAY YC +R + + R+Y 

45 HP1484 '66 GFTLITGILMLLIEPTLFKSGGWLHAKLALVVLLLAYHFYCKKCMRELEKDPTRRNARFY 125 

orfl02 120 RVFNEIPXXXXXXXXXXXXFKPF 142 

RVFNE P KPF 
HP1484 126 RVFNEAPTILMILIVILWVKPF 148 



BNSOOaD-. <WO 992*578A2J_> 
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10 



15 



20 



Homology with a predicted QRF from Njnen ingitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF1 02a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 102. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
1 1 1 1 1 1 1 1 1 1 1 1 1 i I I t I I I 1 1 1 1 1 1 1 I 1 1 1 I I I « 1 1 ■ 1 1 1 1 1 1 I I 1 t t I I I 1 M 1 1 1 1 1 
o r f 1 C 2 a MMFS WFKLFHLFFVI S WFAGLFYLPRIFVNMAMI DVPRGNPEYVRLSGMAVRLYRFMS PL 

!0 20 30 40 50 60 

70 80 90 100 110 120 

orf 102. pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 i 1 1 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 illinium mi 

orfl02a G FGA WFGAAI P FAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQD YSNAFS HRWYR 

-70 80 90 100 110 120 

130 140 
orf 1 02 . pep VFNEIPVLLMVAALYXWFKPFX 

iiiimmiim mmi 

orf 102a V FNE I P VLLMV AAL Y LW FK P FX 

130 140 

The complete length ORF102a nucleotide sequence <SEQ ID 761> is: 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 



ATGATGTTTT 
GTTTGCAGGG 
TTGATGTGCC 
GTGCGGCTGT 
CGGCGCGGCG 
ACGTCAAACT 
GGCGTGCTGC 
CTGGTACCGC 
TGTATCTGGT 



CTTGGTTCAA 
CTGTTTTACC 
GCGCGGCAAT 
ACCGTTTTAT 
ATACCGTTTG 
GTGTTTGGGC 
TGCGCCGTTT 
GTGTTCAACG 
CGTGTTCAAA 



GCTGTTTCAC 
TGCCGAGGAT 
CCCGAGTATG 
GTCGCCGTTG 
CCGCCGGCTG 
TTGATGCTCT 
TCAGGATTAC 
AAATCCCCGT 
CCGTTTTGA 



TTGTTTTTTG 
TTTCGTCAAT 
TGCGTCTGTC 
GGCTTCGGCG 
GTGGGGCAGC 
TGGCTTACCA 
AGCAATGCTT 
GCTGCTGATG 



TCATTTCGTG 
ATGGCGATGA 
GGGCATGGCG 
CGGTCGTGTT 
GGCTGGGTAC 
GTTGTATTGC 
TTTCACACCG 
GTTGCCGCGC 



35 



This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F» 

ORF 102a and ORF 102-1 show complete identity in 142 aa overlap: 



40 



45 



50 



orf!02a.pep 
orfl02-l 

orfl02a.pep 
orfl02-l 

orf 102a. pep 
orfl02-l 



10 20 30 40 50 60 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMI DVPRGNPEYVRLSGMAVRLYRFMS PL 

ii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMiiiiii i minium 

MMFSWFKLFHLFFVI SW FAGLFYLPRI FVNMAMI DV PRGNPE YVRLSGMAVRLYRFMS PL 
10 20 30 40 50 60 

70 80 90 100 110 120 

GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDY SNAFSHRWYR 
IIIIMMIIIIIItlllllllllllllllllllllllllllllllllllllMIIIIII 
GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
VFNE I PVLLMVAALYLWFKPFX 
III I III III I I III I I i I I I I I 
VFNEIPVLLMVAALYLWFKPFX 
130 140 



55 Homology with a predicted ORF from A Gonorrhoeae 

ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from N. 

gonorrhoeae: 



BNSOOC1D: <WO_ 9924S7aA2_l_> 



WO 99/24578 



PCT/IB98/01665 



-426- 



10 



orf 102. pep 
orf 102ng 
orf 102. pep 
orf 102ng 
orf 102. pep 



MMFSWFKLFHLFFVI SWFAGLFYLPRI FWMAMI DVpRGNPEYVRLSGMAVRLYREWSPL 60 

GFGAWFGAAI PFAAGWWGSGWVHVKLCLGLMLIAtfQLYCGVLLRRFQDYSl^F^ 120 

G^Wrc^IPFAA^WGS^ 

VFNE I PVLLMVAALYXWFKPF 142 
III lltllltlllll llllll 
VFNEI PVLLMVAALYLWFKPF 142 



120 



orf 102ng 

The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 



15 



20 



1 ATGATGTTTT 

51 GTTTGCAGGG 

101 TTGATGCGCC 

151 GTGCGGTTGT 

201 CGGCGCGGCG 

251 ACGTCAAACT 

301 GGCGTGCTGC 

351 CTGGTACCGC 

401 TGTATCTGGT 



CTTGGTTCAA 
CTGTTTTACC 
GCGCGGCAAT 
ACCGTTTTAT 
ATACCGTTTG 
GTGTTTGGGC 
TGCGCCGTTT 
GTGTTCAAcg 
CGTGTTCAAA 



GCTGTTTCAC 
TGCCGAGGAT 
CCCGAGTATG 
GTCGCCTTTG 
CCGCcggccg 
TTGATGCTCT 
TCAGGATTAC 
aAATCCCCGT 
CCGTTTTGA 



TTGTTTTTTG 
TTTCGTCAAT 
TGCGCCTGTC 
GGTTTCGGCG 
GTGGGGCagc 
TGGCTTATCA 
AGCAATGCTT 
GCTGCTGATG 



TCATTTCGTG 
ATGGCGATGA 
GGGGATGGCG 
CGGTCGTGTT 
ggctggGTTC 
GTTGTATTGC 
TTTCACACCG 
GTTGCCGCGC 



25 
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40 



45 



50 
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This encodes a protein having amino acid sequence <SEQ ID 764>: 

! MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG RWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F» 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102-1 pep MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGN PE YVRLSGMAVRLYRFMS PL 
' P F | » | | 1 1 1 I 1 1 Ml M I M II 1 1 M I II 1 1 1 1 M M • M I I M M 1 1 1 1 1 1 M M I MM I 
orfl02ng MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DAPRGN PE YVRLSGMAVRLYRFMS PL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102-1 Pep GFGAWFGAAI PFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS HRWYR 
P | | | 1 | I I I I | 1 1 1 I 1 1 I I 1 I I I I I I I I I I 1 I I I I 1 I I I I t I 1 1 I I 1 t I I 1 I I 1 MMII 

orfl02nq GFGAWFGAAI PFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102-1. pep VFNEI PVLLMVAALYLWFKP FX 
I 1 I I I I M I M I I M I II I I I M 
orfl02ng VFNEI PVLLMVAALYLWFKP FX 

130 140 

In addition, ORF102ng shows significant homology to a membrane protein from H.pylori: 

gi 1 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length = 148 
Score « 79.2 bits (192), Expect - le-14 

Identities = 50/14*7 (34%), Positives - 68/147 (46%), Gaps - 13/147 (8%) 

Query 3 FSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DAPRGN PE YVRLSGMAVRLYRFMS PLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

SbjCt: 8 FLWVTCAFHVIAVISWMAALFYXPRLFv^HAENAHKKEFVGVVQIQEK KLYSFIASPAM 65 

Ouerv 63 GAWFGAAIP FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS 115 

w G + + F +G GW+H KL L ++LLAY YC +R + + 
Sbjcf 66 GFTLITGILMLLIEPTLFKSG GWLHAKLALWLLLAYH FYCKKCMRELEKDPTRRN 121 



60 



Query: 116 HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 

R+YRVFNE P KPF 
SbjCt: 122 ARFYRVFNEAPTILMILIVILVWKPF 148 



BKSOCCtO: <WO__99C457BA2.l_> 



WO 99/24578 



-427- 



PCT/IB98/01665 



Based on this analysis, it is predicted that these proteins from ^meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 765>:- 

5 1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

10 101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

15 351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

451 CCGCGCCGAT AA 

This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 

1 MAKMMKWAAV AAVAAA AVWG GWS.LKPEPH VLDITETVRR G 

20 5i 

101 

151 

201 I SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

25 301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the further partial nucleotide sequence <SEQ ED 767>: 



30 



35 



40 



45 



This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

50 1 . . VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR .QAALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTR1TATMD GTWAILVEE GOTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

55 251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 



1 


. . GTATCGGTCG 


GCGCGCAGGC 


51 


ACTCGGGCAA 


CAGGTTAAAA 


101 


CCTCGCAGAC 


CAATACGCTC 


151 


CAGGCGAAGC 


TGGTGTCGGC 


201 


ATATAAGCGT 


CAGGCGGCGT 


251 


ATTTGGAAAG 


CGCGCAGGAT 


301 


GAGCTGAAGG 


CTTTAATCAG 


351 


GTCGGAATTG 


GGCTACACGC 


401 


TGGCGATTCT 


CGTGGAAGAG 


451 


CCGACGATTG 


TCCAATTGGC 


501 


GATTGCCGAG 


GGCGATATTA 


551 


TTACGATTTT 


GTCCGAACCG 


601 


GTCGACCCCG 


GGCTGACCAC 


651 


GGATACGGCT 


TCCAATGCGG 


701 


ATCCGGACGG 


CAAACTCGCC 


751 


ATCGACGGCG 


TGAAAAATGT 


801 


TCGCGGCGGC 


AAGGCGTTTG 


851 


CGGAACGCGA 


AATCCGGACC 


901 


AAAAGCGGGT 


TGAAAGAGGG 


951 


CGCCGAGCAA 


CAGGAAAGCG 


1001 


GATAA 





BNSOOCID: <WO 9924S7BA2_I_> 
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vt 1^, ™*h » predicted QRF from N.menw z j tidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity- over a 153aa overlap with 
an ORF (ORF85a) from strain A of N. meningitidis: 

10 20 30 4° 

5 orf85 pep MAKMMKWAAVAAVAAAAVWGGW S - LKPE PHVLDITETVRRG 

orf85a jLUUJ^^lU^ 

10 20 30 40 50 

in 80 90 100 

1 U ... ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

orf85.pep | | || | | || | 1 1 I I I I I I I I I I I I I I I I M I 

orf85a T I VQLANLDMMLNKMQIAEGDITKVKAGQDI S FTI LSEPDT PIKAKLDSVDPGLTTMSSG 

210 220 230 240 250 260 

* 5 H0 120 130 140 150 160 

orf85.pep gyNSSTDTASNAVYYYARSFVPNPDGKIATGMTTQNTVEI 

orf85a QYI^STDTMI^ 
20 270 280 290 300 310 320 

170 180 190 200 210 220 

orf85 oep AFVRVUMGKAAEREI^ 

orf85.pep | |[ I I It I I U 1 1 M I I I 1 1 1 1 1 H I I I I I I I 1 1 1 I M I I II I I M I I I I I Hill I Ml 
A^RVLGADGKA^ 
330 340 350 360 370 380 



25 orf85a 



230 

or f 8 5. pep PRRX 

30 till 

orf85a PRRX 
390 

The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

;c 51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

201 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

AC) 301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

W 351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

401 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

4 51 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

ac c 51 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

CO 801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 
1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 
« 1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 
JJ HOI AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 
1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 770>: 

1 MAKMMK WAAV AAVAAAA VWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

60 51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

° U ioi EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SJTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

g5 301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 
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351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 

30 40 50 60 70 80 

PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

1 1 1 1 1 1 1 1 1 1 1 1 iiiiiiiimiiiiii 

VSVGAQASGQIKILYVKLGQQVKKGDLIAE 
10 20 .30 

90 100 110 120 130 140 

INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

I t 1 I I I 1 I 1 I I 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 I t I I I I I I 1 1 I 1 1 I I ! ^ 1 I I = : I I = 1 1 1 I I i 1 I i 
INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
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150 160 170 180 190 200 

A1AAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTVVAILVEEGQTVNAAQST 
| : I | I I I I M t I I I I I I t I I t I I i I I t I I I I II I I I 1 I I I I I 1 I I I I t I 1 1 I I t I I! I I I 
AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 

210 220 230 240 250 260 

PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I I 1 1 1 1 I 1 1 I I I 1 1 1 1 1 1 1 11 I I I II I I I I I I 1 1 t 1 1 I I I II I I I 1 1 II I M I I M I I I I 
PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
| | | | | | | | | | I | I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I 1 I I I I I I I 
GG YN S ST DTASN AV YY Y ARS FVPN PDGKLATGMTTQNT VE I DG VKNVLI I PS LT VKN RGG 
220 230 240 250 260 270 

330 340 350 36C 370 380 

RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

: | I 1 1 | | | M I | I I I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I! I II I I I I I I I II I I I 
KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 

390 
PPRRX 
I I I I I 
PPRRX 

Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 

Homology with a predicted ORF from N. gonorrhoeae 
45 ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 
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MAKMM KWAAV AAVAAAAVWGGW S . LKPEPHVLDITETVRRG 
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ORF85ng 
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MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 


50 


0RF85 






250 
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ORF85ng 
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TVRAAQST PT I VQLANLDMMLNKMQI AEGDITKVKAGQDI S FT I LSE PDT 


250 


ORF85 


251 


PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 


300 
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ORF85ng 


251 


PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 
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ORF85 
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ORF85ng 


301 


MTTQNTVE I DGVKNVLLI PSLTVKNRGGKAFVRVLGADGKAVERE IRTGM 


350 


ORF85 


152 


RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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351 


KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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The complete length ORF85ng nucleotide sequence <SEQ ID 771> is: 



10 



15 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
"751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGGCAAAAA 
GGTTTGGGGC 
TTACGGAaac 
GgcgAGATTT 
GCAGATTAAA 
ATTTGATTGC 
GAAAAATCCA 
TGCATTGGGC 
AGGATGATGC 
GCCGCCGCCA 
CAAAATTTCC 
CCGCGACGAT 
ACTGTGAACG 
GGATATGATG 
TGAAGGCGGG 
CCGATTAAGG 
GTCGGGCGGC 
ATTATGCCCG 
ATGACGACGC 
TATTCCGTCG 
TGTTGGGTGC 
AAAGACAGTA 
AGTGGTCATC 
GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
ggTCAGGCGC 
CGCCGTCCAA 
AAGCTTTATG 
GGAAATCAAT 
AATTGGAAAC 
AGCGCGGAGA 
GACCTCTAAA 
AAGCCAATGT 
ATCAATACCG 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
GGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGATATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCACGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGTTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCAGTGGAAC 
AGTGAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAACCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTC 
AGACCAACAC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCGCA 
AAGGCTTTAA 
TTTGGGCTAC 
TTCCCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAAGGCG 
GCGAAATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



CGGCGGCaac 
GCTGCTTATA 
TTCCGCGACG 
AGGCTTCGGG 
AAAAAGGGCG 
GATCGATATG 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
TCAGACAGAG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTATT 
CGCCACGGGG 
ATGTGTTGCT 
TTCGTACGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 772>: 



1 MAKMMKWAAV AAVAAAAVWG 



30 



51 GEISPSNLVS 

101 EKSKLETYQA 

151 AAAKANVAEL 

201 TVNAAQSTPT 

251 PIKAKLDSVD 

301 MTTQNTVEID 

351 KDSMNTEVKS 



VGAQASGQIK 
KLVSAQIALG 
KALIRQSKIS 
IVQLANLDMM 
PGLTTMSSGG 
GVKNVLLIPS 
GLKEGDKWI 



GWSYLKPEPQ 
KLYVKLGQQV 
SAEKKYKRQA 
INTAESDLGY 
LNKMQIAEGD 
YNSSTDTASN 
LTVKNRGGKA 
SEITAAEQQE 



AAYITEAVRR_ 
KKGDLIAEIN 
ALWKDDATSK 
TRITATMDGT 
ITKVKAGQDI 
AVYYYARSFV 
FVRVLGADGK 
SGERALGGPP 



GDISRTVSAT 
STTQTNTIDM 
EDLESAQDAL 
WAIPVEEGQ 
SFTILSEPDT 
PNPDGKLATG 
AVEREIRTGM 
RR* 



35 ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 
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INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
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AIAAAKANVAELKALIRQSKISINTAESDLGYTRITATMDGTVVAIPVEEGQTVNAAQST 
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AFAAAKANVAEIJCALIRQSKISINTAESELGYTRITATMDGTVVAILVEEGQTVNAAQST 

1X0 120 130 140 150 
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250 



260 



210 220 230 240 

PTIVOIJVNLDMMLNKMOIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

lliilMMIMilllMIMlllllMIIIMIIIlllMlillllHIMIIIIIItl 

PTIVQLANLDMMLNKMOIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

" M " 170 180 190 200 210 
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270 280 290 300 

GGYNSST DTASNAVYYYARS FVPNPDGKLATGMTTQNTVE I DGVKNVLLI PSLTVKNRGG 
tl||||||||IMIIIIIIItlllMIIIMIMI!Mllinil!ll:IIIHIIIIII 
GGYN S ST DTASN AVYYY ARS FV PN PDGKLATGMTTQNTVE I DGVKNVLI I PSLTVKNRGG 
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KAFVRVLGADGKAVEREIRTGMKDSMNTEWSGUCEGDKWISEITAAEQQES 

........ t ... . 1 I I I I I I I 1 I I I I 1 I It I I I 1 I II t I I I I I I I M I I 1 



Orf85na KAFVRVLGADGKAVEREIRTGMKUinix i ^ v a^^ourvvv i c^x i 

9 I I I I II I M I I I I - i I I M 1 I I : I I t I I I M I I I I I 1 1 I I | | M I I I I I I 1 > M I I I M I 
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LGADGKAAEREIRT 

280 290 



orf85-l mrVRVLGMGK^ 



390 

orf85ng PP*** 
Mill 

orf 85-1 PPRRX 

10 In addition, ORF85ng shows significant homology to an E.coli membrane fusion protein: 

gi 11787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coii) Length = 380 
Score - 193 bits (485), Expect - 2e-48 
15 Identities « 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 

Ouerv 29 PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 88 

V p y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L- 
Sbjct: 41 PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQUCTLSVAIGDKVKKDQLLGV 100 

Ouerv 89 INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 148 

V Y * I+ N L ++ L +A+ A+ L A Y RQ L + A S++ 

Sbjct: 101 IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 

75 Ouerv 149 XXXXXXXXXXXXXXXIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 208 

w Y " I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 

Sbjct: 161 EMAVKQAQIGTIDAQIKRNQASLDTAKTKLDYTRIVAPMAGEVTQITTLQGQTVIAAQQA 220 

Query 209 PTIVQLANLDMMLNKMQIAEGDITKVKAGQOISFTILSEPDTPIKAKLDSVDPGLTTMSS 268 
30 P 1+ LA++ ML K Q++E D+ +K GQ FT+L +P T + ++ VP 

Sbjct: 221 PNILTLADMSAMLVKAQVSEADVIHLKPGQKAWFTVLGDPLTRYEGQIKDVLP 273 

Ouerv 269 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 328 
+ + ++A++YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
35 Sbjct: 274 T PEKVN DAI FY YARFEVPN PNGLLRLDMTAQVHI QLT DVKNVLTI PLS ALGDPVG 328 

Query 329 KAFVRV-LGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISE 372 

+V L +G+ ERE+ G ++ + E+ GL+ GD+WI E 
Sbjct: 329 DN R YKVKLLRN GETRERE VT I GARN DT DVE I VKG LEAG DE W I G E 373 

40 Based on this analysis, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in E.colU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
45 was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 

Example 92 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 773>: 

50 l ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 "ttcgacgatt AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

401 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; 0RF12O: 



1 . . XPAIWTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

4 01 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI PAIWTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF120 shows 92.4% identity over a 1 84aa overlap with an ORF (ORF120a) from strain A of N. 
meningitidis: 

10 20 30 

orf 120 . pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 

ttll: I I I I I I I 1 I I I I I I I I I I 

orf 120a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 
10 20 30 40 50 60 



40 50 60 70 80 90 

or f 12 0 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 
I It I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I : I I I I I II I 1 II I I I I 
orf 120a SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 80 90 100 110 120 

100 110 120 130 140 150 

or f 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
I I I I I I I I I I I I I I II I I II I I I I I I I II I I I I I I I I I I I I I M I I I II I I I I I I I I I I I 
orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 160 
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160 170 180 

orf 120 . pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
lllllllHltllMltMMIMllMMMlll 
o - f 1 2 0a SLNNI PAQIGYTDDGKT YTLKLKSVQINGQAAKPX 

190 200 210 220 

The complete length ORF120a nucleotide sequence <SEQ ID 777> is: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 

1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTL.K LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120a pep MMKTFKNIFSAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIK 
I I I \ I I I I I I I I I I I I I! II II I I I I I I I I I I I I M I I M I I : I I I I I I I I I 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 120a pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
|| l| || II || MUi Mil II I II II II II MM I IMI II Mil 111 : HUM 
orf 120-1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120a pep DLFTUVWQU^DAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 
1 1 1 t II t II 1 1 M M I M I I E M 1 M I M II I i I M I M I It I I I 1 I I I I I M I M I I 11 
orf 120-1 DLFTLAWQIAANDAKLPPGLKITNGKKLYSVGGIJ^KAGTGKYSIGGVETEVVKYRVRRGD 
130 140 150 160 170 180 

190 200 210 220 

orfl20a pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
M I M I II U I I II II II II II M M U II I I II M 1 I M I I M 
orf 120-1 DAVMY FFAP S LKN I PAQ I G YT DDGKT YTLKLKS VQ I NGQAAKPX 

190 200 210 220 

Homology with a predicted ORF from N. gonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 

orf 120 pep I PATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

I 1 I I M I II I M M II M M ! II I I M I M 
orfl20ng SAAILSAALPCAYAARLPQSAVLHYSGSYGI PATMTFERSGNAYKIVSTIKVPLYNIRFE 69 

orfl20 pep SGGTWGKTIJiPTYYROIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWOL 90 

M I II I II M I I : M : II I M MMMMMIMMI I I I M I I M I I I I 

orfl20ng SGGTWGNTLH PAY YKD I RRGKL YAEAK FADGS VT YGKAGE SKTEQS PKAMDL FTLAWQL 129 
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orfl20 pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

P P M I I I I I I J I I Ml | | | I I I I M M M m H I I I | I | | 1 1 1 1 II I I I M I t : I 1 1 I I I 
orfl20ng AANDAKLPPG1^ITNGKKLYSVGG1^KAGTGKYSIGGVETE\A^RVRRGDDTVTYFFA^ 189 

orfl20.pep SLNNIPAQIGYTDDGKTYTLKLKSVOINGQAAKP 184 

I I I I 1 1 I I t I I I 1 I I J I i I I 1 1 M I t 1 I I 1 I I I 1 
orfl20ng SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 223 

The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 

10 1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAAGGCTACC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAATCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTGCCTACT 

15 251 ATAAAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGTCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGCC TGAATAAGGC GGGTACGGGA AAATACAGCA TaggCGGCGT 

20 501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATACGGTAA 

551 CGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAG CTCAAATCGG TGCAGATCAA 

651 CGGACAGGCC GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ED 780>: 

25 1 MMKTFKNIFS AAILSAALPC AYAA RLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

30 In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
| I i I t i 1 I I I 1 t I f I I I t I I ! I I I 1 1 I I I I I t I I I f t I I I t I 1 I I I 1 1 I I I 1 I I I I I 1 I 
orfl20no MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
35 " 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 120-1. pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ] i x 1 1 = < 1 1 1 1 1 1 1 1 1 i « 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 

40 orfl20no V PLYN I R FESGGT WGNT LHPAYYKD I RRGKL YAEAKFADGSVTYGKAGE S KTEQS PKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 120-1 . pep DLFTIAWQIAANDAKLPPGLKITNGKKLYSVGGL^ 
45 II I I I II I I I I I I II I I I I t I I 11 t M I I I I I I I I I M I I I I I M I I M M I I I I I I I I I 

orfl20nq DLFTIAWQLAANDTVKLPPGLKITOGKKLYSVGGl^KAGTGKYSIGGVETEVVKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

50 orf 120-1 .pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

I : I I I I I I I I I I I I II ! I I It I 11 I II I I I I I I I I I I I 

orf!20ng DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

This analysis, including the presence of a putative leader sequence in the gonococcal protein 
55 suggests that the proteins from N.meningitidis and N gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 93 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 78 1>: 
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1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 . GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATT. . 

This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNI . . 

Further woric revealed the complete nucleotide sequence <SEQ ID 783>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACC7T GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; ORF121-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DT LTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PKLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 ROGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results: 



Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF121 shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) from strain A of A'. 
meningitidis: 

10 20 30 40 . 50 60 

or f 12 1 . pep MYRRKGRG I KPWMGAGXAFAALVWLV FALGDTLT PFAVAAVLAYV LDPLVEWLQKKGLNR 
| 1 | | 1 1 J ! I I 1 I I II 1 I 1 t t I I I 1 I 1 t I 1 I I I I I 1 I I I 1 I I I I I I I I I I I I I I I I I I I 
O r f 1 2 1 a MYRRKGRG I KPWMDAGAAFAALVWLVFALGDTLT PFAVAAVLAYVLDPLVEWLQKKGLN R 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121 . pep ASASMSVMVFSLILIJALLLIIVPMLVGQFNNI^RLPQLIGFMQNTLLPWLJOT'IGGYV 
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10 



I I I ! I II | | M | || | | | I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M Mill I I I I I I I 
orfl21a ASASMSVMVFSLILLLALLLIIVPMLVGQFNNIJVSRLPQLIGFMQNTLLPWLKNTIGGYV 
70 80 90 100 I 10 120 

130 140 150 

orf 121 .pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNI 
I 1 1 M It I I M It M M M M M I M I I I M I M I I 
orfl21a EIDOASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
130 140 150 160 170 180 

o-fl21a SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

The complete length ORF121 a nucleotide sequence <SEQ ID 785> is: 



15 



20 



25 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGCGA 
AGGCAGGGCG 
CTTGCTGCTT 
TTGCCAAACT 
GGCAATTTGA 
GATGCTGATT 
TGGATTCGGG 
CCCTATTTGG 
GCTCCAGTTC 
CCGTAGGACA 
GACCGTATCG 
CGGGCAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAGG 
ACGAGGTATT 
ATGGGTTTGG 
GTTTGCAATC 
GCGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GCCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGCTT 
GGTATGGTTG 
AGGACTGCTG 
ACGGCATCTT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 
G 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGGTTGGTG 
CCGGTATTTT 
CTGGCAACCG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
TGCAGAAATA 



ATGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTTTTTGTT 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



This encodes a protein having amino acid sequence <SEQ ID 786>: 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF 5LILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQ LLVMLI MGLVYGLGLV LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNGILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

45 ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 



40 



50 



55 



10 20 30 40 50 60 

orf 121a. pep MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I 1 I I I I I I I t I I I imilllMMIIIMIMIIIIIIMMIIIMIMMIIMII 
orf 121-1 MYRRKGRG I KPWMGAGAAFAALVWLVFALGDTLT PFAVAAVIAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121a . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
I II II I MM M II It III II MMMM Ml Mill I II Ml MM III II II Ml Ml 
orf 121-1 ASASMSVMVFSLILLIJUjLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 



60 



65 



130 140 150 160 170 180 

orf 121a. pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

1 1 I 1 1 I I M M M I M I I 1 1 I I I II M M it M II I M I I M I 1 1 M 1 I I II I M 

orf 121-1 EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 121a. pep SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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orf!21-l 
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1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 iiiniimi iiimmiiii niiMmiiiiiii 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

250 260 270 280 290 300 

GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
||:| I | 1 | | 1 | | | | I I I I I 1 t 1 1 I 1 I i I I I I 1 I 1 I I I x t 1 t i I t 1 I I I I I t t I I I t 1 t I I 
GMLAGILVFVPYLGAFTGLLIATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 



310 320 330 340 350 

orfl21a oeo DRIGLSPFWVIFSI^FGQI^GFVGMIAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
P !|||Mllltlllllliti!lMMIIMI!llMIIIIIMlMIIIIII!lll!l 
or f 1 2 1 - 1 DRIGLS PFWVI FSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGS FYRGRX 

310 320 330 340 350 

Homology with a predicted ORF from A [go norrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 
N. gonorrhoeae: 

or^!2 1 pep MYRRKGRG IKPWMGAGXAFAALVWLVFALG DTLTPFAVAAVLAYVLD PLVEW LQKKG LNR 60 

I I I I 11 j I I I IM II! I I I 1 1 1 I I 1 ^ I I M I I I t 1 I I I 1 1 1 1 MINlllilM 

orfl21ng MYRRKGRG I K PWMGAGAAFAALVWLVYALGDTLT PFAVAAVLAYVLD PLVEW LQKKG LNR 60 

orfl21 pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQKTLLPWLKNTIGGYV 120 

1 I | ! 1 | 1 I t I 1 I 1 I 1 1 t I I I I I t I I I I I 1 1 t I I I S ! i t 1 t I 1 I 1 I I I 1 t I I I t I I I I I ! I 
orfl21ng ASASMSVMVFSLILLIALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

orf 121 . pep E I DQAS 1 1 AWLQAHTGELSNALKAWFPVLMRQGGN I 1S6 
IIM II 1111:11 II MM I I I Ml lit 11:1 HI I 

orfl21ng EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 

An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 

1 MYRRKGRG IK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 F.WT,OKKni,NR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KOGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 

1 ATGTATCGGA GAAAAGGACG GGGCATCAAG CCGTGGATGG GTGCCGGCGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTA CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AATAATTTGG CATCTCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG TTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AAACAGGGCG GCAATATTGT GAGCAGTATC GGCAACCTGC TGCTGCCGCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TCGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACG 

601 GGTAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 

651 GATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATT GATG CTAGTCGGAC 

701 TGGATTCGGG ATTTGCCATC GGTATGGTTG CCGGTATTTT GGTGTTTGTC 

751 CCCTATTTGG GTGCGTTTAC GGGATTGCTG CTTGCCACTG TTGCAGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTCGGTCA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATTGTAGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGAGAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 
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This corresponds to the amino acid sequence <SEQ ID 790; OKF121ng-l>: 

X MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTLTPFAVAA VIAYVLDPLY 

51 EW1 *}KK(Z T .MR AQtQMc<f M\fp ct.tUjLALLL IIV PMLVGOF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW tUAHTGELSN ALKAWFPVLM 

5 151 KOGGNIVS SI GNLLLPPLLL YYFLL PWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAGILVFV 

251 PYLGAFTGLL LA TVAALLOF GSWNG 1LAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MGF VGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 

10 ORF121ng-l and ORF12M show 97.5% identity in 356 aa overlap: 

10 20 30 40 50 60 

orf 121-1. pep MYRRKGRGIKPWMGAGAAFAALWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
||lj||||ltllMlilllll!l(ll:lll!llllillllllMIIIMIIIIIIIIlil 
orfl21nq-l MYRRKGRG IK PWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLD PLVEWLQKKGLNR 

J5 " " 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 1-1 pep ASASMSVMVFSLILLl^LLirVPMLVGQE^NlASRLPQLIGFMQNTLLPWLKNTIGGYV 
|| 111 III 1 1 IN] I II II I i I 111 III II II I I I 1 11 II 1 1 M M llllllllllllll 

70 orf!21na-l ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

oniony ^ ^ ^ 100 110 120 

130 140 150 160 170 180 

orf 121-1. pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
25 i I I i I I I I I I : I I I I I I I I I 1 I I ! I I I I I t : I I I I I I I !! I I H I I 1 1 1 I II I I M I 1 1 

orf 121nq-l EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 

130 140 150 160 170 180 

190 200 210 220 230 240 

30 or f 1 2 1 - 1 . pep SCG I AKLVPRRFAGAYTRITGK LNEVLGE FLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

1 | | M 1 1 1 I M M 1 M t I I I ! I I 1 I I 1 M I I I i 1 M 1 M I 1 1 I M I M t : 1 1 M I I I M 1 
orfl21na-l SCG I AKLVPRRFAGAYTRITGN LNEVLGE FLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 

y 190 200 210 220 230 240 

35 250 260 270 280 290 300 

orf 121-1 . pep GMLAG I LVFVPYLGAFTGLLLAT VAALLQFGSWNG I LSVWAVFAVGQFLE S FFI T PKI VG 
| I x 1 I | I I 1 I I I I t 1 I I I I 1 I 1 I I I I I 1 1 1 1 I I I I I I ^ I I I I I I 1 I t 1 1 1 I I I 1 t t i i 1 I 
orfl21na-l GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
9 250 260 270 280 290 300 

40 

310 320 330 340 350 

orf 12 1-1 . pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 r 1 1 1! 1 1 1 1 1 1 1 1 1 1 1 1 • I e 1 1 1 1 1 1 1 a i f i 

orfl21na-l DRIGLSPFWV I FSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKY FAGS FYRGRX 

45 ~ ^ 310 320 330 340 350 

In addition, ORF121ng-l shows homology to a permease from HJnfluenzae: 

sp|P43969lPERM_HA£IN PUTATIVE PERMEASE PERM HOMOLOG Length - 349 
Score = 69.9 bits (168), Expect « 2e-ll 

Identities » 67/317 (21%), Positives = 120/317 (37%), Gaps « 7/317 (2%) 

Query: 26 VYAIXSDTLTPFAVAAVIAWLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

55 Query: 85 MLVGQFKNLASRLPQLIGFMQNTLLPWLKNTIGGYVE- 1 DQASI I AWFQAHTGELSNALK 143 

ML Q +L S LP + N WL N YEID + + + F + ++ + 

Sbjct: 92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 147 

Query: 144 AWFPVllfKCJGGNIVSSIGMXXXXXXXXXXXXXDWQRWSCGIAKLVPRRFAGAYTRITGNL 203 

60 + + + N+VS D G+++ +P+ A+ R + 

Sbjct: 148 SAVKX*SIASIMNLVSI^IYAFLVPL^FFMliCDKSELLQGVSRFLPKNiWIAFXRWK-^ 206 

Query: 204 NEVWEFIJIGQXXXXXXXXXXXXXXXXXXXXDSGFAICTWAGILVFVPYXXXXXXXXXXX 263 
y + + ++ G+ + + G+ V VPY 

65 Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 



50 
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Query: 264 XXXXXQFGSWNGILAVWAVFAVGOFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV Q L+ + p + + + L P +1 S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYI I IAFAVSQLLDGNLLVPYLFSEAVNLHPLI II ISVLI FGGLWGF 326 

5 Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ +PLA ++ 
Sbjct: 327 WGVFFAIPLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N. meningitidis and 
10 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 94 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 79 1>: 

1 . -ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 

15 51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

20 301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

4 01 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

4 51 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG. . 

25 This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 

1 . .TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

30 Further work revealed the complete nucleotide sequence <SEQ ED 793>: 

1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

35 201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

40 4 51 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

45 701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 

1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

50 101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology wjth » ^-dieted ORF from N meningit id is (strain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORFl22a) from strain A of N. 
meningitidis: 



orfl22.pep 
orfl22a 



10 20 30 

TAFSAALRLSPSXLVI FLS FGKPYQQTAAI 

111111:111 t :|III1IMMIMMI 
FLPLLPKASMKKU^^PVPMPMYSFSGTNSTAPSAAMRLSSSCWin-SFGKPYQQTAAI 

30 40 50 60 70 90 

in 40 50 60 70 80 90 

orfl22.pep LTFFCTSCPPRSNAYQQYRRU^ 

PP | hi | til illl 111! II till Ml NWIMMII I I 11 II M I I I III I I Ml 
orfl22a LTFFXT SC PPR SN PYQQYRRLRLYAFHAPE ITE FFVG FAFX VDARNVYAQ IGGDVGTHLR 

90 100 HO 120 130 140 



15 



orfl22.pep 
orfl22a 



X00 110 120 130 140 150 

NVRRECGFLCNHGRI DIDRLPTLRLNALIRRTQKDAAVRI FELCGGVGEMAADIAQTCRT 

|:||| M I t M t 1 I 1 1 M U I 1 1 M I II I I 11 I I t 1 I M 1 I I I Mil II II Ml 

NMRREFGFLCNHGRI DI DRLPTLRLNAL IRRTQKDAAVRI FELCGGVGEMAADIAQTCRT 



20 iso 



160 170 180 190 200 



160 170 180 

orf 122 . pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
| | | | | It II It II II I 1 1 I I M II M II I I I I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 

* 210 220 230 240 250 

The complete length ORF122a nucleotide sequence <SEQ ID 795> is: 



30 



1 


ATATCATATT 


51 


GCCTTTGATT 


101 


TGGTCGAACC 


151 


ACTGCNTTTT 


201 


TTTGTCCTTT 


251 


TTNNNACGTC 


301 


CTGCGACTCT 


351 


TTTTGCCTTT 


401 


ATGTTGGCAC 


451 


AATCACGGTC 


501 


TTTGATACGC 


551 


GCGGCGGTGT 


601 


GAGCAGCGCG 


651 


CGAGCAGCCC 


701 


CTGCCTTCGG 


751 


CGTCATCGTT 



GGTACCGATG CCGATGTATT CGTTTTCGGG TACGAATTCG 
CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 
GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 
CTGCCCGCCG CGTTCAAATC CTTACCAGCA ATACCGCCGC 
JUJ _ ATGCCTTCCA TGCGCCCGAG ATAACCGAGT TTTTCGTTGG 

-1C 351 TTTTGCCTTT GANGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

GCATTTGCGG AATATGCGGC GCGAGTTTGG GTTTCTGTGC 

GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 
CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 
CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

40 



This encodes a protein having amino acid sequence <SEQ ID 796>: 

4< i ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 TAFSAAM RLS SSCWIFL SF GKPYQOTAAI LTFFXTSCPP RSNPYQQYRR 

10 1 LRLYAFHAPE ITE FFVG FAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

50 251 RHRLCS* 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 

10 20 30 40 50 60 

orfl22a nep ISYWTVSSSLDFIXVDTAPLIFLPU-PKASMKKLMVEPVPME^YSFSGTNSTAFSAAMRLS 
orfl22a.pep ^ | 1 1 1 1 1 1 1 1 1 1 1 M M II 1 1 1 M I I II I I I I I : M I II 1 1 1 H 1 1 1 1 Ml I 
55 orf 122-1 ISYWASSSPDFIXVDTAPLIFLPIJiPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 

1Q 20 30 40 50 60 

70 80 90 100 110 120 

orn22a oep SSCWIFLSFGKPYQQTAAILTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAF 
60 or.l22a.pep millUMIM I I I I I I I M IMMMMMII 111:11111 Ml 

orf!22-l S SCVV I FLS FGKPYQQTAAI LT FFCT SC PPRSN AYQQYRRLRLY AFHPPE I AE FFVG FAF 
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70 



80 



90 



100 



110 



120 



10 



15 



20 



orf 122a. pep 
orfl22-l 

orf 122a. pep 
orfl22-l 

orf 122a. pep 
orfl22-l 



130 140 150 160 170 180 

XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 I I 1 I tl HIM Ml Ml! M IIIIIIMIIIIII 
DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
130 140 150 160 170 180 

190 200 210 220 230 240 

FELCGGVGEMAADIAQTCRTEQRVGKGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGOLV 

| I I I I I I I I I I II II | II | I I I I I I II II I I I I I I II I I I I M I 1 M II I I I I I I I ! I I I 
FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 • 220 230 240 

250 

D I VALS DTDVRHRLCSX 
II llilillllill I II 
DI VALS DTDVRHRLCSX 
250 



Homology with a predicted ORF from N. gonorrhoeae 

OKF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 



25 



30 



35 



N. gonorrhoeae: 

orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 

The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



TAFSAALRLS PSXL V I FLS FGKP YQQTAAI 3 0 
MMMMII I : I I I I I I I i I I I I 1 I I I 
FLPLLPKASMKKLMVEPVPMPKVSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 80 

LT FFCT S C P PRSNAYQQYRRLRL Y AFH P PE IAE FFVG FAFDVDARNVYAQI GG DVGTHLR 90 
IMIIM lllll M I M I I III I I I I 1 I II I I I I I I I I I : I M I : tlllllilllll 
LT FFCT SW P PRSN P YQQ Y RRLRL Y AFH P PE IAE FFVG FAFD I DARN I DT Q I GG D VGTH L R 140 

NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 

III ) I I I I I I I I I I I I : I I I I I I I I I I M I I I I I I I \ I I I I I I I I I : I I I I : M I I I I 
NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 
I I I I I I I I I I I : I I : I I I I I I I I II I M I I 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtCCttt 
TTTGCACGtC 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 



1 MSYRASSSPD 
51 TAFSAAMRLS 



FLEVETAPLI 
SSCWIFLSF 



60 



101 LRLYAFHPPE 

151 NHGRIDIDHL 

201 EQRVGNGVQO 

251 RHRLCS * 



IAE FFVG FAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSWPP 
IGGDVGTHLR 
FELCGGVGKM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYOQYRR 
NVRCEFGFLC 
AADVAQTCRT 
DIVALSDTDI 
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ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 



10 



15 



20 



25 



30 



orf 122-1. pep 
orfl22ng 



orf 122-1. pep 
orfl22ng 



orf 122-1. pep 
orf 122ng 



orf 122-1. pep 
orfl22ng 

orf 122-1. pep 
orfl22ng 



10 20 30 40 50 60 

ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
:ll III II MM I t I I M I I I I I I I | | | | | I I I : I I I I I I I t I I I I I I I I I I 

MSYTUISSSPDFI^ETAPLIFLPLLPKASMKKIWVEPVPMPMYSFSGTNSTAFSAAMR^ 

10 20 30 40 50 60 

70 80 90 100 110 3:20 

SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 

I I I I I I I | | | | | M | | 1 | | I I I I I I I I I I 1 I I I I I I I M M M I II I I I I I I I I I I I I 
SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 
70 80 90 100 110 120 

130 140 150 160 170 180 

DV DARN V Y AQ I GG DVGTH LRNVRRE FGFLCNHGRIDIDRLPT LRLN AL I RRT QK DAAVR I 
I : II I I : : I I M I I I I I I I N I M I I I I I I M I I I I : I I I I I I I 1 I I I I I I I I I I I I I 
D I DARN I DTQ I GGDVGTHLRNVRCE FGFLCNHGRI DI DHLPT LRLNALI RRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

Ml IIIICII I l:tl Mill I I i I I I I I I I : I I : I I I I II II I I I I I I I I I I I I I I I 
FELCGGVGKMAADVAQTCRTEQR VGNGVQQRVG IRMPEQP FFKW DFN SAKYQLS AFGQLV 
190 200 210 220 230 240 

250 

DIVALSDTDVRHRLCSX 
I I I I I I I I I : I I I I I t I 
DIVALSDTDIRHRLCSX 
250 



Based on this analysis, it is predicted that the proteins from ^meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 799>: 

35 1 ..GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 
101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 
151 ATGGGGCGGA XTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 

40 1 . .A GAS ANN ISA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 

51 MGGFDCRLFR LETA* 

Further work revealed the complete nucleotide sequence <SEQ ID 801>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

45 101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

50 351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

55 601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 
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801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

5 1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

1 0 This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 

1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGK 
51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 
151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 
15 201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWLSTVTT TFLDAYSAGA SANNISARFA E TPVAVGVTL 
301 IGTVLAVMLP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEGFDF 
351 AGLVLWLAGF ILYRFLL SSG WESSIGLT AP VMSAVAIATV SVRLFF KKTQ 
401 SLQRNPS* 

20 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of N. 
meningitidis: 

10 20 30 

25 orf 125 . pep AGASANNISARFAETPVAVSVTLIGTVLAV 

I I : 1 I I I I II : :: I I : I I : I : : : I I : I II 
orf 125a K I LLGAGLGAAG I LA WLSTVTTT FLDAYS AGVS ANN I S AKLSE I P I AVAVAWGTL LAV 

250 260 270 280 290 300 

30 40 50 60 

or f 125 . pep MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 

: I i I I I I I I I I I I I I I I I I I I : 
orf 125a LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
310 320 330 340 

35 The ORF1 25a partial nucleotide sequence <SEQ ID 803> is: 

1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

40 201 CGGACKCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

45 4 51 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 

501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 

601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

50 701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

801 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 

851 ATATTTCCGC CAAACTTTCG ■ GAAATACCNA TCGCCGTTGC CGTCGCCGTT 

901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 

55 951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. . 

This encodes a protein having the partial amino acid sequence <SEQ ED 804>: 

1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
51 AVGGALFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
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ES FVWWALAN GALIVLWLV F 
AGSTAAXVXD GMSFGTAVEL 
LAYTLTGCWM YALGLAAALF 
TFLDAYSAGV SANNISAKLS 
IGSVFAPMAA VLI ADFFVLK 



GARKTGGLKT 
SAVMPLSWLP 
TGETDVAKIL 
E IPIAVAVAV 
RREEIEG. . 



ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 
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15 



20 



25 



30 



35 



40 
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orfl25a.pep 
orf!25-l 



orf 125a. pep 
orfl25-l 



orf 125a. pep 
orf!25-l 



orfl25a.pep 
orfl25-l 



orfl25a.pep 
orfl25-l 



orfl25a.pep 
orfl25-l 



10 20 30 40 50 60 

MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

till! II I | | = | | I I 1 I I I I I I I I I I I I I I I I I I I I I I I » M 1 I I I I I I I I I 111 I ' ' * 
MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

20 30 40 50 60 



10 



120 



70 80 90 100 110 

AY I GALTGXX SMES VRLS FGKRG S VLFS VANMLQLAGWTAVM I YAGATVS SALGKVLWDG 
Mil II I I 1 | I I I I 1 I I M I M I I I M I I I I M I M M M I ! I 1 I I I I I I 1 I Ml Ml 
AY I G ALTGRS SME SVRLS FGKRG S VLFS VANMLQLAGWTAVM I YAGATV S S ALGKVLW DG 

70 80 90 100 



110 



120 



130 140 150 160 170 180 

ES FVWW ALAN GALIVLWLVFGARKTGGLKTVSML1^4LLAVLWLSAEXFS TAG STAAXVXD 

| M II j | | | | I) I I I I I II II I I I I I I II M I I II I I I I I I M I I I II I I M I II i I 
ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 

130 140 150 160 170 180 



190 200 210 220 230 240 

GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

iiiMMiiiiiiiiiiiiiiuiniiiiiiiniiiiiiiiHiMiiiiiiiimi 

GMSFGTAVEI^AVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 
190 200 210 220 230 240 

250 260 270 260 290 300 

TGET DVAKI LLGAGLGAAG I LAWLSTVTTT FLDAYS AGV5 ANN I SAKLSE I PI AVAVAV 
I | I | | I | | | | | | 1 I t I I t I I I I M I I I I I H I I I I ! I I I : I I I H I I I MMM:: 
TGET DVAKI LLGAGLGAAG I LAWLSTVTTT FLDAYS AGASANN I S ARFAET P VAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

VGTLLAVLLPVTEYENFLLLI GS V FAPMAAVLI ADFFVLKRREE IEG 
: 1 1 : I I I : I 1 1 1 1 1 1 1 1 I 1 1 I M I 1 1 I M 1 1 I I 1 1 1 1 1 1 1 M I M M 
I GTVLAVMLPVTEYEN FLLLI GSVFAPMAAVLI ADFFVLKRREE IEG FDFAGLVLWLAGF 

310 320 330 340 350 360 



Homology with a predicted ORF from N gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 



50 



N. gonorrhoeae'. 
orf 125. pep 
orfl25ng 
orf 125. pep 
orfl25ng 



AGASANNISARFAETPVAVSVTLIGTVLAV 30 

imnimm I IMCIIII Mill 

KILLGAGIXSITGIIAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVTLIRTVLAV 308 

MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 

|||||||:llllll 111:11 MINIM I : M 
MLPVTEYKNFLLLIRSVFGPMAGG FDCRLFCLKTA 343 



An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
55 acid sequence <SEQ ID 806>: 
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MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



AVGG ALFFAA AYIGALTGRS 
VMIYVGATVS SALGKVLWDG 
VS MLLMLLAV LWLSVEVFA S 
PLAADYTRQA RRPFAATLTA 
LLGAGLGITG ILAWL STVT 
LIRTVLAVML PVTEYKNFLL 



SMES VRLS FG KCGSVLFSVA 
ES FVWWALAN GALIVLWLV F 
SGTNAAPAVS DGMTFGTAVE 
TLAYTLTGCW MYALGLAAAL 
TTFLDTYSAG ASANNISARF 
LIRSVFGPMA GGFDCRLFCL 



NMLQLAGWTA 
GARRTGGLKT 
LSAVMPLSWL 
FTGETDVAKI 
AE IPVAVGVT 
KTA* 
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Further work revealed the following gonococcal DNA sequence <SEQ H) 807>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 

401 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 

501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 

551 CCTTCGGAAC GGCAGTCGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 

601 CCGCTGGCCG CCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 

651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 

701 TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGAAAATC 

751 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

801 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 

851 ACAACATTTC CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 

901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGAT.TGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 M5GNA5SPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG I LAWL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 

301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLI ADFFVL KRREEIEGFD 

351 FAGLVLWLAG FILYRFLL SS GWESSIGLTA PVMSAVAIAT VSVRLFF KKT 

401 QSLQRNPS* 

ORF125ng-l and ORF125-1 show 95.1% identity in 408 aa overlap: 

10 20 30 40 50 60 

orf 125-1 . pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGIAALLLGHAVGGALFFAA 
I I I II I I I I I I : I I I I : I I I I I I I ! i I I I I I ! I I I I I I I I I I 1 I I I I I I I I I I I t t I I I I 
orfl25ng-l MSGNASSPSSSAAIGLVWFGAAVSIAEISTGTLLAPIX^QRGIAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125-1. pep AY I GALTGRSS ME SVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVS SALGKVLWDG 
I ! I I I I I I I I I I II I I I II I I I I I I I I I I II I II I I II I I II I : I I I I I I I I I I I I I I I 
orfl25ng-l AY I G ALT GRS S ME S VRL S FGKCG S VL FS V AN MLQ LAG WT A VM I YVG AT V S S ALG KVLW DG 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf 125-1 . pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I I I I I I II I I I : I I I ::: I : : I I II 
orfl25ng-l ES FVWW ALAN GAL I VLW LV FGARRTGGLKT VSMLLMLLAVLW LS VEV FAS SGTNAAPAVS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 125-1. pep DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
I I I : I I I I I I I I I 1 1 I I I I I I I I I I I I I : I I I I I I II I i I I I I I I I I I I I ( I I I I I I I I I 
orfl25ng-l DGMT FGT AVE L S AVM P L S W L P LAAD YTRQARR P FAAT LT ATLA YT LTGCWM YALG LAAAL 

190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 125-1 . pep FTGETDVAKI LLGAGLGAAG I LAWLST VTTT FLDAYS AGASANN I SARFAET PVAVGVT 
lllllilllllllilll : I M I I I I I I I I I I I I I : I ! M I I I I I I I ! I i M MMIII 
orfl25ng-l FTGETDVAKI LLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEI PVAVGVT 

250 260 270 280 290 300 
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300 310 320 330 340 350 359 

or£125-l Pep "GTVLIWMLPTOmFUXlGSVn^ 

orfl25ng-l li^i^^ 

orflZ5ng i ^ ^ Q 330 340 350 360 



360 370 380 390 400 

orf 125-1 Pep FILYRH-LSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQS^PSX 
10 I I M I I I I I I II 111 I I I 1 1 II I I II II 1 1 I I 1 1 I I Ml HI I I I Ml 

orf I25na-1 FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 
orfl25ng 1 ^ 39Q 4Q0 



Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
15 N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 96 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

Oft 51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

10 l GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A. ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

95 301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC.CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG . . 

30 This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

35 Further work revealed the complete nucleotide sequence <SEQ ID 81 1>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

40 201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

45 451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 

50 701 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

151 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

55 951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

10 Q1 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 
1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 
1101 A 
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This corresponds to the amino acid sequence <SEQ DD 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTAT.QL AEOGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 126 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 

I I I I I I I I 1 I II I I I I I I II I I I I II I I I I I I : I I II I I I I I M I I I I I | I I \ : | I | | | 
orf 126a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 . 60 . 

70 80 90 100 110 120 

orf 126. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 

II ! I II I t I I I I I I I I I : I : I : I 1 I I I I I H I I I I I I I I I : I I I I I I I I I I : I I i 
or f 1 2 6a E WRLGRQX I PLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNE FVRHLKRGG VAD DX I 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 126 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 
I I I I I I I I I I 1 I 1 I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 1 I M I I I i I I ! I I 
orf 12 6a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSAIADALDELNVPCHWEHECAPE 
130 140 150 160 170 180 

The complete length ORF126a nucleotide sequence <SEQ ED 813> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GCAGANCATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 



1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLZ DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 
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251 EVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
301 LNHHNPEIRY NRARRLIEIN GLFRHGmi^JAVTAAAVRL AVALFDGKXA 
351 PERDEESGLA YIRRQ D * 



ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 
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orf 126a. pep 
orfl26-l 

orfl26a.pep 
orfl26-l 

orfl26a.pep 
orfl26-l 

orf 126a. pep 
orfl26-l 

orfl26a.pep 
orfl26-l 

orf 126a. pep 
orfl26-l 



KTTRIAILGGGLSGRLTAI^IAEQGYQIALFDKGCRRGEHAAAYVAAAM^ 

llrrRIAII/SGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAWAAA^ 

20 30 40 50 60 



10 

70 80 90 100 110 120 

EVVRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

I M | III) III MM I I: I: I Ml I II M I M I M I M 1 1 M M M M M II ll H I 
EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 



120 



130 140 150 160 170 180 

VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 
| | | | 1 1 I I II I I M M I I I I I M I I I I I I M I M M II II M I M M M II M M II M I 
VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 

190 200 210 220 230 240 

DLQAQYDWLIDCRGYGAKTAWNOSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

imilMMMMMlMillll I I II II II II I M II M M M II M I M II I I I 
GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
190 200 210 220 230 240 

250 260 270 280 290 300 

LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 

M1MI MMMIIIIillllllMMMMMMMMIMMMMIMM 

LYIAPKENHVFV I GATQIESESQAPASVRSGLELLSALYAIHPAFGEADI LEIATGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 

IMIIIillMMMMMIMIIIIIMIIMMIIMIIMIMM IMIIMMII 
LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 

310 320 330 340 350 360 



YIRRQDX 
I II I I I I 
YIRRQDX 



orfl26a.pep 
orfl26-l 

Homology with a predicted ORF from N. gonorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
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N. gonorrhoeae: 

orfl26.pep 
orfl26ng 
orf 126. pep 
orf 126ng 
orf 126. pep 
orf 126ng 



MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 

lllltMIIMIMMMMIIMIII Mil: I M II II II I II M II I M MUM 
hn'RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMIAPAAEAVEATP 



60 



EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 

| I : | | I I I I I i M I M 1 I I II I I II I M M M II II M I I M I I I M M M I I Mill 
EVIRLGRQSIPLWRGIRCRI^TLTMMQENGSLIWHGQDKPLSSEFVnmLKRGGVADDEI 

VRWRADDI AERE PQLGGRFXDGI YLPTEXQLDGRQLX SALADALDELN VPCHWEHECVPE 
| | 1 t K I : 1 I t I t 1 I 1 I t I t M It II M HUM: M I II M M I II M M II II M : 
VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 



120 



120 



180 



180 



60 An ORF126ng nucleotide sequence <SEQ ID 815> was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 

l MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 
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51 PAAEAVEATP 

101 PLSSEFVRHL 

151 LDGRQILSAL 

201 WNQSPEHTST 

251 SSSARPKSKA 

301 LNHHNPEIRY 

351 PERDEESGLA 



EVIRLGRQSI 
KRGGVADDEI 
ADALDELNVP 
LRGIRGEVRG 
KAKPPPAYVP 
SRERRLIEIN 
YIGRQD* 



PLWRGIRCRL 
VRWRADEIAE 
CHWSHECAPQ 
FTRPKSRSTA 
GWNSYPRSMP 
GLFRHGFMIS 



NTLTMMQENG 
REPQLGGRFS 
DLQAQYDWVI 
PCACCTRAIR 
STPPSAKPTS 
PAVTAAAVRL 



SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
STSPRKKTTS 
SKWRPGLRPT 
AVALFDGKDA 



Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGACCCGTA 
ATTGCAGCTT 
CCCGCCAAGG 
CCTGCGGCGG 
GCAGAGCATT 
CGATGATGCA 
CCATTATCCA 
TGACGAAATC 
AACTCGGCGG 
CTCGACGGGC 
GAACGTCCCT 
CCCAATACGA 
TGGAACCAAT 
AGTGGCGCGG 
TGCTGCACCC 
TTCGTCATCG 
CGTACGTTCC 
CCTTCGGCGA 
CTCAACCACC 
CGAAATCAAC 
CCGCCGCCGC 
CCCGAACGTG 
A 



TCGCCGTCCT 

GCAGAACAAG 

CGAACACGCC 

AAGCGGTCGA 

CCGCTTTGGC 

GGAAAACGGC 

GCGAGTTCGT 

GTCCGTTGGC 

ACGTTTTTCA 

GGCAAATATT 

TGCCATTGGG 

CTGGGTAATC 

CCCCCGAGCA 

GTTTACACGC' 

GCGCTATCCG 

GCGCGACCCA 

GGGCTGGAAC 

AGCCGACATC 

ACAACCCCGA 

GGCCTTTTCC 

CGTCAGATTG 

ATGAAGAAAG 



CGGAGGCGGC 
GTTATCAGAT 
GCCGCCTATG 
GGCAACGCCC 
GCGGCATCCG 
AGCCTGATTG 
CCGCCATCTC 
GCGCCGATGA 
GACGGCATCT 
GTCTGCACTT 
AACACGAATG 
GACTGCCGGG 
CACCAGCACC 
CCGAAATCAC 
CTCTACATCG 
AATCGAAAGC 
TCTTATCCGC 
CTCGAAATCG 
AATCCGCTAC 
GGCACGGCTT 
GCAGTGGCAC 
CGGTTTGGCG 



CTTTCCGGAA 
TGAACTTTTC 
TTGCCGCCGC 
GAAGTCATCA 
ATGCCGTCTG 
TGTGGCACGG 
AAACGCGGCG 
AATCGCCGAA 
ACCTGCCGAC 
GCCGACGCTT 
CGCCCCCCAA 
GCTACGGCGC 
TTGCGCGGCA 
GCTCAACCGC 
CCCCGAAAGA 
GAAAGCCAAG 
GCTCTATGCC 
CCGCCGGCCT 
AGCCGCGAAC 
TATGATTTCC 
TGTTTGACGG 
TATA7CGGAA 



GGCTGACCGC 
GACAAGGGCA 
GATGCTCGCG 
GGCTGGGCAG 
AACACGCTCA 
GCAGGACAAG 
GCGTAGCGGA 
CGCGAACCGC 
CGAAGGCCAG 
TGGACGAACT 
GACCTGCAAG 
GAAAACCGCG 
TACGCGGCGA 
CCCGTGCGCC 
AAACCACGTC 
CCCCCGCCAG 
GTCCACCCCG 
GCGCCCCACG 
GCCGCCTCAT 
CCCGCCGTAA 
AAAAGACGCG 
GACAAGATTA 



This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 



51 PAAEAVEATP 

101 PLSSEFVRHL 

151 LDGRQILSAL 

201 WNQSPEHTST 

251 FVIGATQIES 

301 LNHHNPEIRY 

351 PERDEESGLA 



EVIRLGRQSI 
KRGGVADDEI 
ADALDELNVP 
LRGIRGEVAR 
ESQAPASVRS 
SRERRLIEIN 
YIGRQD* 



PLWRGIRCRL 
VRWRADEIAE 
CHWEHECAPQ 
VYTPEITLNR 
GLELLSALYA 
GLFRHGFMIS 



NTLTMMQENG 
REPQLGGRFS 
DLQAQYDWVI 
PVRLLHPRYP 
VHPAFGEADI 
PAVTAAAVRL 



SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
LYIAPKENHV 
LEIAAGLRPT 
AVALFDGKDA 



ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 



10 20 30 40 50 60 

orf 126-1 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

1 1 1 1 ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 iiiii cMmitmimiiiiiiiiii 

or f 1 2 6nq - 1 KTRI AVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 10C 110 120 

orf 126-1. pep E WRLGRQS I PLWRG I RCRLNTHTMMQENG S L I VWHGQDK PLS S E FVRH LKRGG V ADDE I 
llrtllliliillilllMill I 1 I I I I I t 1 1 t 1 1 t 1 I 1 t I t I I I I I I 1 1 1 t I I I I f t 1 
orfl26nq-l EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 



130 140 150 160 17C 180 

orf 126-1 . pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
| | | | | | : | | | I I 11 I II 1 I I ! I I I I 1 I I I I I I M I M 1 t I I It I II I I I I I I I I M I : I : 
orfl26nq-l VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPC 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12 6-1. pep GLQAQYDWLI DCRG YGAKTAWNQS PEHTSTLRG I RGEVARVYTPEITLNRPVRLLH PRY P 
1 1 1 1 1 1 1 : I I 1 1 f 1 1 1 I I I 1 1 1 1 1 1 1 I I I I 1 1 1 1 t I I 1 I 1 I I I 1 t I I I I I 1 1 1 I I 1 I t 1 
orfl26no-l DLQAQYDWVI DCRG YGAKTAWNQS PEHTSTLRG I RGEVARVYTPEITLNRPVRLLH PRY P 

190 200 210 220 230 240 
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250 260 270 280 290 300 

LYIAPKENHVFVIGATQIESESQAPASWSGLELLSALYAIHPAFGEADIIXIATG^PT 

| | | 1 | | I I II I I I I I | | | | | | | | | | | | | I I M I I I I I I I I : I II I I I I i i I I H ''I] HI 
LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LNHHN PE I RYNRARRLIE INGLFRHG FMI S PAVT AAAARLAVALFDGKDAPERDKESGLA 
Ml I II III 1:1 I I 1 I 1 1 1 1 1 t 1 I I 1 I I I I I I I M I r | | 1 | | | | | | | | 1 I * I i. ' U t * 
LNHHN PE I RYSRERRLIEINGLFRHGFMIS PAVT AAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 
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orf 12 6-1. pep YIRRQDX 
1 I MM 

orf!26ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli) 
Length - 327 
Score = 169 bits (423), Expect - 3e-41 

Identities <= 112/329 (34%), Positives = 163/329 (49%), Gaps - 25/329 (7%) 

Query: 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 

RI V G G++G A QL G+++ L ++ G 
Sbjct: 2 RI LVNGAGVAGLTVAWQLYRHGFRVT LAERAGTVGA-GASG FAGGMLAPWCERE S AEEPV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + • G+L+V G+D F R G DE+ 

Sbjct: 61 LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS-GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

IA EP L GRF ++ E LD RQ L+ALA L++ + + 
Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYPLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDH DRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

Query: 243 IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 

I p++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPEIRYSRERRLIEINGLFRHGFMISP 331 

+ P R ++E R + +NGL+RHGF+++P 
Sbjct: 27 9 DNLP — RVTQEGRT LHVNGLYRHG FLLAP 305 

This analysis suggests that the proteins from N. meningitidis and A^. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 97 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
819>: 



. 1 ATGACTGATA 

51 GATATTGTCT 

101 TTGAGAAAGC 

151 CATTTTATGG 

201 TACCAAGTGG 

251 GTTTGAATGG 

301 AAGGCGGTAG 

351 TGAAAATCTA 

401 GACGGGCTGG 

451 GTAG 



ATCGGGGGTT 
GTACTTGCTT 
AAAGATAAAT 
AAAAGTTTTA 
CCAAGTTTGC 
AATCGtCGCG 
CCATAGATAA 
GTAACCTTTA 
ATTATTTTAA 



TACGCTGGTT 
TAATTGTTTA 
GCAGTGCGGG 
TCTGCAGAAT 
CGATTAAAGA 
CGGG. .GCTT 
AGATAAAAAT 
aTTTGCAAGA 
AGGAAATGAT 



GAATTAATAT 
TCCGAGCTAT 
CAGCCTTGTT 
GGGAGGTTTA 
GGCAGAAGGC 
TAGACAGTAA 
CCTTTTATTA 
AGTCCGCCAG 
AAGGACTGCA 



CAGTGGTCTT 
CGCAATTATG 
AGAAAATGCA 
AACAAACATC 
TTTTGTATCC 
ATTCATGTTG 
TTAAGATGAA 
TTCGTGTAGT 
AGTTACTTAA 
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This corresponds to the amino acid sequence <SEQ ED 820; 0RF127>: 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 
101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 
151 # 

Further work revealed the following DNA sequence <SEQ ID 82 1>: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. men ingitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A of//. 



meningitidis: 



orf 127 .pep 
orfl27a 



orf 127 .pep 
orfl27a 



10 20 30 40 50 60 

MT DN RG FT LVE L I S WL I L S V LAL I V Y P S YRN YVEKAK I N AVRAALLEN AH FME K FY LQN 

1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 1 1 1 I I M I 1 1 1 1 1 1 1 I : I ( N 1 1 I 1 1 1 1 I I I I I I I I 
MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
10 20 30 40 50 60 

70 80 90 100 110 120 

GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 

| | | | | | | | I | I 1 I I I I I I I I I I 1 i I I I ! II I I I I I I I I M I I I I 1 I I I I I I 1 I I I i M 
GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 

70 80 90 100 110 



130 140 150 

orf 127 .pep VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 
i t I I I I I I 1 I I II II M 1 II 1 I I I I I 1 I I I I 
orf 127a VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF 127a nucleotide sequence <SEQ ID 823> is: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 
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ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 

orfl27a.pep 



30 



in 20 30 40 50 60 

MTDNRGn-LVELISVVLII^VIjaiVYPSYIWYVEKAKINTVR^^N^^K^LQN 

III II II III lllllll I III I MM "1111111111111:1111111111 1 III I II II 

5 otfi27-i otdnrgftiatelisvvlilsvlM'IV^ 

10 20 30 40 50 *>° 

70 80 90 100 HO 120 

orf 127a . pep GRFKQTSTKWPSLPIKEA£GFCIRI^GIAR<^DSKFWU<^ 

orfl27-l GRFKiTST^Psipi^GFCIRL^ 

70 80 90 100 110 l^u 

130 140 150 

1 5 orf 127a . pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

I I II I I I I I I I I I I 1 I I I I M II I I M II I 
o r f 1 2 7 - 1 TFI CKKSAS SC S DGLDY FKGN DKDCKLLKX 

130 140 150 

20 Homology with a predicted ORF fro m N gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
^gonorrhoeae: 

orfl27 pep MTDNRGFTLVELI SWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFME1KFYLQN 60 
| | || | | | | It I I 1 I I I M I N I I I I I II I I I II i I I I I 11 I I I I I : I I I M I I I I I i I I I 
25 orfl27ng MTDNRGFTLVELISVVLILSVIJdjIVYPSYRNYVEKAKINAVIWVFI^NAH 60 

orf 127 oeo GRFKQTSTKWPSLPIKE^GFCIRLNGIVARXALDSKEWLKAVAIDKDKNPFIIKMNENL 120 

PP I 1 1 111 I i 1 I I t I f I I I 1 f I Ml II I I 1 I I 1 I I t I I I 1 I I I I I I I I I I 1 I 1 I I 

orfl27ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127 .pep VTFI CKKSAS SCS DGLDY FKGN DKDCKLLK 150 

illlllimilll I I 1 1 1 I 1 I I I t 1 t t 1 
orfl27ng VTFI CKKSAS SCSDRLDYFKGNDKDCKLLK 149 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 

35 i ATGACTGATA ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

40 251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 826>: 

45 i MTDNRGFTL V ELISWLILS VLALIVY PSY RNYVEKAKIN AVRAAFLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG N DKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

50 orf 127-1 . pep MTDNRGFTLVELISVVLILSVIALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

P p | | | | | | | | | | | | 1 1 1 1 1 | | H I I I I I I I I I II 1 1 t M I I I I I I I I I 1 1 I I I I I ! I 1 I I I I 
orfl27nq-l OTDNRGFTLVELISWLILSVIALIVYPSYRNY^KAKINAVRAALLENAHFMEKF^ 

10 20 30 40 50 60 

55 70 80 90 100 110 120 

orf 127-1 pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMIJCAVA1DKDKNPFIIKMNENLV 

" p p H i it ii i in mi in ii i ii ii inn mii ii M mi ii ii urn ii mill ii 

orfl27nq-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
* 70 80 90 100 110 120 

60 
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130 140 150 

TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i > > 

orfl27no-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 



orf 12*7-1. pep 



This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from ^meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

10 Example 98 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 827> 

1 . GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

15 151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 

251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 

301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

20 401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 

4 51 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 

501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 

551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 

601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 

25 651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 

701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 

1 . . VSLASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

30 101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

35 si CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

40 301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

45 551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

-751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

50 801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

55 1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
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1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGAG 
1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 
1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 
1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 
5 1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCTGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 
1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 
1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 
1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 
in 1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 
1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 
1801 TATATGGGGC GGGAATTCCA CAAACACGAA CGCCTGCTTA AATCTTCCCA 
1851 CGGCGGCGCA TTGCAGTAG 

15 This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 

1 MOAVRYRPE I DGLRAVAVLS VMIFHLN NRW LPGGFLGVDI FFVISGFLIT 

51 GIILSEIQNG SFSFRDFYTR RTKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA SSFLPS GFYT DILNQPNTYY 

20 201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKROLLSSLC FGALLACLFV 

251 TDKHNP FIPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI arRHYTTfiDK OL gLPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

?5 451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 

551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKSSHGGA LQ* 

Computer analysis of this amino acid sequence gave the following results: 
30 Homology with hypothetical integral membrane protein HI03 92 of HMuenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orf 128- 1 VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 

++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
HI0392: 4 6 MALVS FI ASAI FI YNDFNKLRKT IELAI AFLSN FYLGLTQGYFDLSANEN PVLHIWSLAV 105 

Orf 128- 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLI FPLI LI LAYKKFREVKVLFI ITLI LFFI LLATSFVS AN FYKEVLHQPN I YYLS 165 

40 Orf 128- 121 TLRFPELIAGSL1AVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSL1AIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from N.meninzitidis (strain A) 
45 ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A ofN. 
meningitidis: 



35 



orf 128. pep 



10 20 30 

VS LAS VI ASQI FLYEDFNQMRKTVELSAVF 
t 1 I 1 I I I I 1 I I 1 I I I I I I I 1 1 I t I 1 I I I 1 I 
50 orfl28a I LSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQI FLYEDFNQMRKTVELSAVF 

60 70 80 90 100 110 

40 50 60 70 80 90 

orfl28 oeo LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 

55 M ,1 I || 1 I I I I II III I I I I I I I I M I I I M 1 I I I Ml I I I I I I M I I I I I I M 

orf 128a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 

100 110 120 130 140 150 

60 orfl28 pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 

OU orflZS.pep ||M||M:l|||||l|ln|||I||||M||MlMm , 1 , l , ! |,| m ilMIIII 
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15 



orfl28a 



orf 128 .pep 
orfl28a 



orf 128 .pep 
orf!28a 



orfl28a 



ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
180 190 200 210 220 230 

160 170 180 190 200 210 

RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
lllllli I M I I | M I I I ! 1 I I I I J 1 I I I I I I I M I I | | t t I 1 I M I M 11 I t M 1 1 1 f 1 
ROLLS SLCFGALLACLFVI DKHN PFI PGMTLLLPCLLTALLIRSMQYGTLPTRI LS AS PI 
240 250 260 270 280 290 

220 230 240 

VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 
II M i 1 1 1 1 I I I I 1 1 1 1 1 I I I I I lllllli 

VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 

KMT FKKA F FC L YL A P S L I LVG YN L Y ARG I LKQEHLR P L PGAP LAAEN H FPETV LT LGD S H 
360 370 380 390 400 410 



The complete length ORF128a nucleotide sequence <SEQ ID 83 1> is: 
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25 



30 



35 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 

951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
GGCATCATTC 
TTATACCCGC 
CGCTGGCTTC 
CAAATGCGGA 
TCTGGGGTTT 
TACTGCATAT 
CCTCTTTTGC 
GCGTAACATC 
TGCCAAGCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATTGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATA 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGACGGCGCA 



TCCGATACAG 
GTCATGATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCC 
AAACCGTGGA 
CAGCAGGGGT 
CTGGTCTTTG 
TGATATTTTG 
AGCATCATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATCACTCTGC 
ACAATCCGTT 
GCACTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGTTG 
CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



ACCGGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTGCG 
ATTTCGATTT 
GCAGTAGAGG 
CTGCAAAAAA 
TATTTCTGAT 
GATATTCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCAT 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGTTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCGCGC 
CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCAGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTCTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACAAAATCGC 
TTTGACTGCC 
ACCAACCCAA 
GCAGGTTCGC 
AACAGCAAAT 
TGCTTGCCTG 
ATGACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCTTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTCT 
CGCAAGGGGG 
CCCTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
ACATCGTTTT 
TACTTATTAC 
TGCTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATTACATTAC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATCAT 
GACACCTGCG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCTCG 



This encodes a protein having amino acid sequence <SEQ ED 832>: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MQAVRYRPEI_ 
GIIL SEIQNG 
QMRKTVELSA 
PLLLIFCCKK 
LSTLRFPELL 
IDKHNPF IPG 
SLYLYHWIFI 
KRKMTFKKAF 
FPETVLTLGD 
NPLCRKYROE 



DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 
SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFN 
VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 
TKSLRVLRN I SIILFLILTA TSFLPS GFYT DILNQPNTYY 
AGSLLAVYGQ TQNGRRQTAN GKRO LLSSLC FGALLACLFV 
MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 
&PAHYITGDK OLG LPAVSAV AALTAGFSLL SYYLIEQPLR 
FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 
SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 
VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
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30 



35 



40 



45 



501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDlG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

orfl28a oeP MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
" P I I I I I I I I t I I I | | | | | || I I I II I I I M I I I II II I II I I M II I I I MMN " Nil I 

orf 128-1 MQAVRYRPEIDGLRAVAVLSVMIFHLNKRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

or f 128a pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

I I t I t I I i I I I 1 1 1 1 I I I I I I 1 I I I I 1 I t I I * I I I I I K I 1 < I I 1 I I * <>> 1 1 1 1 S 1 1 1 1 1 
orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orfl28a pep QQGYFDLSADENPVI^IWSIAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
I | | I I I II I I I I t I I I I II I I I I M II M I I I I I I i I I I I I I I I t I I M I I I I I I I I III 
orf 128-1 QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

orf!28a pep TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
: I I I I I I I I 1 1 1 1 I I 1 1 I 1 1 1 I I i 1 M II I I I I I I I I I I I M I II I I I I 1 M I I I I I I I I 
orf 128-1 SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

orf 128a . pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 I I 1 1 1 1 I 1 1 1 1 

orf 128-1 FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

orf 128a . pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
I I 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I • I 1 1 1 1 I I I I 1 I 1 I I I I I 1 1 I MIMMIIllll 
orf 128-1 SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

orf 12 8a . pep FCLY1APSLILVGYNLYARGIUCQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I I I I I 1 I I M M 1 I I I I I 
orf 128-1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPIAAENHFPETVLTLGDSHAGHLRGFL 

or f 128a . pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

I I 1 I 1 1 I t M I 1 1 I I 1 I I 1 1 M ! I I 1 I I I I I 1 t 1 I I I I t 1 I t I 1 1 I I I I I I 1 I I 

orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128a . pep PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 
I M I II I I I I I II 1 1 I I II I I I M I I I I I I I I I M I I I I I I I I I I I M II II I I I 1 I M I 
or f 1 2 8 - 1 PVPRFEAQSFLI PGFPARFRETVKRIAAVKPVYVFANNTS I SRS PLREEKLKRFAANQYL 

orf 128a. pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I III INI llltl II milllll Ml M I I ! Ml I I M I! M I I I MIMIMIIIIII 
orf 128-1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

or f 128a . pep YMGRE FHKHERLLKS SRDGALQX 
Mllllllllllllll: HMI 
orf 128-1 YMGRE FHKHERLLKSSHGGALQX 



Homology with a predicted ORF from N. gonorrhoeae 
50 ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
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gonorrhoeae: 

orf 128. pep 
orfl28ng 
orf 128. pep 
orf 128ng 
orf 128 .pep 
orfl28ng 
orf 128 .pep 
orfl28ng 



VSLASVIASQIFLYEDFNQMRKTVELSAVF 30 

I 1 1 I 1 1 II 1 1 1 1 I I I I I I I I I I 1 : 1 1 1: 1 1 

ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 112 

LSNIYLGFQQGYFDLSAOENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 
Milllll: M I I I I I I I I I I I I M I M I M I M I M I M M M MIMIMIIIIII 

l^NIYl^FRl^FDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 

ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

I M MM I Mi 11:111 1 I I 1 I I I 1 I I I I I I I : I I t I I I I I f I I I I I I I I Ml 

ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

RQLLSSLCFGALLACLFVIDKHNPFI PGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 
Mill II II II I : I I I I I I I I M II I I M II M I I M M M I M I M M I I M M II II 

RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 292 
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orf 128 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 24 4 

orfl28ng V EVGKISYSLYLYHWI FIAFAH YITGDKQLGLPAVS AVAALTAG FS LLS Y YLIEQPLRKR 352 

5 The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 

1 ATGCAAGCTG TCCGATACAG GCCTGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATTATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCGGGATT CCTCATTACC 

151 AACATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

10 201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CCCTGGCTTC GGTGATTGCT TCTCAAATCT TCCTTTACGA AGATTTCAAC 

3C1 CAAATGAGGA AAACCATAGA GCTTTCTACG GTTTTTTTGT CCAATATTTA 

351 T T TGGGGTTC CGATTGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCGGTAGAGG AACAGTATTA CCTCCTGTAT 

15 4 5i CCTCTTTTGC TGATATTCTG TTACAAAAAA ACCAAATCAC TACGGGTGCT 

501 GCGTAATATC AGCATCATCC TGTTTCTGAT TTTGACCGCA TCATCGTTTT 

55^ TGCCGGCCGG GTTTTATACC GACATCCTCA ACCAACCcaa TACTTATTAC 

60- CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GTGGGTTCGC TGTTGGCGGT 

65 i TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGAAAAT GGAAAACGGC 

20 701 AGTTGCTTTC ATTACTCTGT TTCGGCGCat tgCTTGTCTG CCTGTTCGTG 

751 ATCGACAAAC ACGATCCGTT TATCCCGGGA ATAACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCGCTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCCTTCGCCC ATTACATTAC 

25 951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGCTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTTT ATCTCGCCCC 

1101 GTCCCTGATG CTTGTCGGTT ACAACCTGTA TTCAAGAGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGCTG CCCGGCACGC CCGTTGCTGC GGAAAATAAT 

30 1201 TTTCCGGAAA CCGTCTTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCGGCAGGGA AGGGTGGAAA GCTAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TGGATGAGAA GCTGGCAGAC 

1351 AACCCGTTGT GCCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCTGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

35 i45i GATTTGAAGC GCAATCCTTC CTGATACCCG GGTTCAAAGC CCGATTCAGG 

1501 GAAACCGTCA AGAGGATAGC CGCCGTCAAA CCTGTATATG TTTTTGCAAA 

1551 CAATACATCA ATCAGCCGTT CTCCCTTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCTATAAA CCAATACCTC CGGCCTATTC GGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGGTT AAAGATATTC CCAATGTGCA 

40 1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATACACG 

1751 GACGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTCA AGCATTCCCG 

1851 AGGCGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 834>: 

45 1 MQAVRYRPE I DGLRAVAVLS VIIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 (tflRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRO LLSLLC FGALLVCLFV 

50 251 1DKHDP FIPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK OLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLY1APSLM LVGYNLYSRG ILKQEHLRPL PGTPVAAENN 

401 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKI LSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

55 501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 

orf 128-1 pep MQAVRYRPEIDGLRAVAVLSVMIFHUWRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
60 ^ " | 1 1 | 1 1 I I I I I I I I I I I I I I I : I I I I I I M I i I I I I M I I I I I I I I 1 I I !• I I I I I M i I 

orfl28ng MQAVRYRPEIDGLRAVAVLSVIIFHLNNRWLPGGFLGVDIFFVISGFLITNIILSEIQNG 

orf 128-1. pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNOMRKTVELSAVFLSNIYLGF 

I 1 1 M 1 I t I i I I 1 I 1 1 1 I 1 I I 1 I I i I I 1 IIIMIIIMM:|ll:IIMII!ltt 

65 orfl28ng SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQI FLYEDFNQMRKTIELSTVFLSNIYLGF 
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orf 128-1. pep 
orf!28ng 
orf 128-1. pep 
orfl28ng 
orf 128-1. pep 
orf 128ng 
orf 128-1. pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 



(XX5YFDLSADENPVUiIWSlAVEEQYYLLYPI^LlFCCKKTKSLRVLRNISIILFLILTA 

: 1 1 I 1 I | t I I I | | I I I 1 1 1 I I I I I * i « I I 1 1 I I I 1 I I I I I I t I I 1 I I i I i I I I I I 1 i t 
RLGYFDLSADENPVLHIWSIAVEEQYYLLYPIXLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGOTQNGRRQTANGKRQLLSSLC 

I I t I I = I 1 ! I I I t 1 I 1 1 1 1 I I I t I I 1 I I I I r t 1 I I I I I I I t i I I I I J I Mlltlll M 
SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I I I M : 1 I I M 1 1 I : I I I I I : I I M I I I I t i I I M I I I I i I M I I I I I I I I I I I I I I I M 
FGALLVCLFVIDKHDPFIPGITLLLPCIXTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHW I FIAFAHY ITGDKQLGLPAVS AVAALTAG FSLLS YYLIEQPLRKRKMT FKKAF 

| | || IMIMI II II II I I II III II II I I II I IN Mil I II III M llllllll MM 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLS YYLIEQPLRKRKMT FKKAF. 

FCLYLAPSL I LVGYNLYARG ILKQEHLRPLPGAPLAAENH FPETVLT LGDSHAGHLRGFL 
I I I I II I M : I M I II I : I I M M M 11 I M I M : I I I M M II II M M II M II M M 
FCLYIAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
||||:|| MIMIMMIMIMMMMMMM II MIMMIMM Ml Mill III 
DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQS FLI PGFPARFRETVKRI AAVKPVYVFANNTS I SRS PLREEKLKRFAANQYL 
M M 1 11 M II I I U I II II I M I M I M II II M M M I I M i M M M II M MM 
PV PRFEAQS FLI PG FKARFRET VKRI AAVKPVYVFANNT S I SRS P LREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
| | t : I M I M M M M I MM M M I II M M M M II II I h II II II M M M M I M 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGRE FHKHERLLKSSHGGALQX 
M M I II I II I M I I: II M M 
YMGRE FHKHERLLKHSRGGALQX 
610 620 
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In addition, ORF21 8ng shows homology to a hypothetical H.influenzae protein: 

Spl P43993 I Y392 HAEIN HYPOTHETICAL PROTEIN HI0392 >gi 1 1074385 Ipir I I B64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 11573364 (U32723) H. influenzae predicted coding region HI0392 (Haemophilus 
influenzae] Length « 245 
Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%) , Positives - 152/225 (67%), Gaps = 1/225 (0%) 

Query: 38 VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 97 

+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 
Sbjct: 1 MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 60 

Query: 98 DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 157 

' DFN++RKTIEL+ FLSN YLG GYFDLS A+EN PVLHI WSLAVE Q I 

Sbjct: 61 DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 120 

Query: 158 YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 217 

YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 
Sbjct: 121 YKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLSNLRFPELLVGSLLAI 180 

Query: 218 YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVI DKHDPFI PGIT 262 

Y N + Q +L++L L CLF+++ + FIPGIT 

Sbjct: 181 YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 
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Example 99 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 835>: 

1 ATTATTTACG MTACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 **GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 

1 . IliTYRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI H1XKAGAPMR 
51 VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 837>: 



1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

ic 10 i CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

J 151 ccCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

20 351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

7 c 6Q1 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

^ J 65 i GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 

1 MDFRFD IIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

OA 5i AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

10 l LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYV1LPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
35 Homology with a predicted ORF from N. menin gitidis (strain 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) from strain A of N. 

meningitidis: 

10 20 30 40 50 

^rfT29 dpd t T VF. Vft WM FL YG ALTTLGLT WAXAGG S VLGL LLALA R L I HLEKAGAPMRV LAW 

40 orn«.p* P | | , i | | n I I I II I U I I I IM I : H I I I 1 M U H I I I I II II I I I I I ill I I 

orfl29a MnrornT t VF.YRWMFI.YGALTTLGLT VVATAGGSVLGLLIALA RLIHLEKAGAPMRV1AW 

10 20 30 40 50 60 



60 10 80 

45 orfl29.pep ht.pkvsIiLYVTLFRGTP LFVQIVIWAYVWFPFFV 

i inn iiniinm in in mm i hi hi 

&T.Ryvsl.LYVTLFRGTP LFVOIVIWAYVV?FPFFV HPSDGILVSGEAAIALRRGYGP£ 1 IAG 
70 ~80 90 100 110 120 



orfl29a 
50 orfl29a 



SLALIANSGAYICEI FRAGIQSI DKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
130 140 150 160 170 180 



The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 
51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 
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101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

5 301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

10 551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>: 

15 1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV OIVIWAYWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVAS L 

201 AYVQNT1TGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

20 ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 

orf 1 2 9a . pep MDFRFD 1 1 YEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRV1AW 
I I I 1 1 1 I I t I 1 I I I I I I I 1 I I I I I I t I I I I I I I t I I I I I I t t I I I I 1 I 1 I 1 I I I ! I I 1 I 1 
orf 1 2 9- 1 MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

25 orf 129a .pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

J M I I I 1 1 I I I M I M I III I I I I I I I HI I HI I I t I I I I I I! I II! I I I I I M I I I I I 
orf 129-1 ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 12 9a. pep S1ALIANSGAYICEIFRAGIQSIDKGWEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

30 I ] | I I I I I I I I I I 1 ! I II I I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I t I 

or fl29-l SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 12 9a. pep EFITLUCDSSLLSVIAVAEIAYVQNTITGRYSVYEEPLYTVALIYLIOTTFLGWIFLRLE 
I I I i I I I I 1 1 1 1 1 1 1 I 1 1 1 1 I I I 1 1 I 1 1 1 1 I ! 1 1 1 1 I 1 1 I t 1 1 1 1 1 1 1 1 1 1 I I I 1 t 1 1 I I 
35 orf 129-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 



40 



orf 12 9a. pep KRYNPQHRX 
I I I I I I I I I 
orfl29-l KRYNPQHRX 



Homoloev with a predicted ORF from N gonorrhoeae 
ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
N. gonorrhoeae: 

orfl29 pep 1 1 YE YRWM FLYGALTTLGLTWAXAGG S VLGLLLALARLI HLEKAGAPMRVLAW 54 

45 | I I 1 I I I I I I I I I 1 I 1 I I I I I I I = I 1 I t I I I 1 I I I V I I I t 1 I t I I t I 1 1 I 1 I I I 

orf 129ng MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 

orf 129 . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 
I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

50 orfl29ng axrkvsllywlfrgtplfvqiviwayvwfpffviijitafu^amrqsrrvpdkgrwiag 120 

An ORF129ng nucleotide sequence <SEQ ED 841> was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGL TWAT AGGSVLGLLL ALAR LIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVIL HTAF 

55 101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 

1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 
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51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

!01 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGT7TGTG CAGATTGTGA 

5 251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

10 501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAG AAC C 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

15 This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGL TWAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

20 201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

orf 129-1 .pep MDFRFDIIYEYRWMFLYGALTTI^LTWATAGGSVLGLLIAIJ^LIHLEKAGAPMRVLAW 
I I I t I I I I 1 I I ! 1 i I I 1 I t 1 1 I I I I 1 I I I I I I 1 I I I I 1 I I I f I I I I I I I 1 IMMIIMI 
orfl29ng-l MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
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o-*129- 1 -pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

1 1 1 1 1 1 I 1 1 I t I 1 1 1 1 i 1 1 i 1 1 I I I 1 1 1 » 1 1 1 1 1 1 1 J 1 1 t 1 1 1 I I I i I I I I 1 1 1 1 

or^l29ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 



30 orf 129-1 .pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

t 1 i I I t I t I 1 I I t t i I 1 t I I 1 I 1 f 1 I I 1 I I t I I I I I I II I I I I M I ! I M I t I I I I 1 I I 
orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 129-1 .pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTT FLGWIFLRLE 
35 I I II I II I I I I I I M I I I I I I I I M I I I M I I I I I I I I I I : M I I M M I I I I I I I I I I I 

orfl2Sng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTT FLGWIFLRLE 

orf 129-1. pep KRYNPQHRX 
I 1 t I I I I t I 

40 orfl29ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus) Length = 224 
Score « 132 bits (329), Expect = 2e-30 
45 Identities « 86/178 (48%), Positives = 103/178 (57%), Gaps « 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI+I +F P+ GI + E A G +AL 

Sfcjct: 58 I STAYVEV I RGT PLLVQI L I VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIESIPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

55 Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDS S LLS V I + + EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
60 antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 



l 



10 



CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 

51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 

101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 

151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 

201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcJAgT 

251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 

301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 

351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 

401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 

451 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 

501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 

551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 



15 This corresponds to the amino acid sequence <SEQ ID 846; 0RF13O: 

1 LK£CRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 

51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 

101 HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 

151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI P I FRAN A FT D DPE* 

20 Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

95 201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

on 451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

35 701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 
801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 
951 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 
901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 
40 951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 
1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 
1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPF FVGAAV IAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

45 51 T.nWTftFSGNL KP VATLMAAL LLAASAIL PF SPQTASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLIA AFTVFQTAYA V SGDLNLLRA QVHLNMAAVM 

15! FVS VRVSILL GA EALKECRL KDPVFIPNIV YKNIAITFLL LHAAAELWLP 

201 aotagft ala"vgfillakl r ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 ^^^kt-Qxttpa fiAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 
50 301 IAVPILFAAA VSRAF3LMNVN P IFFITVPAI LTAAVFVL YL FT FI PI FRAN 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.m ^iincritidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) from strain A of N. 
55 meningitidis: 
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10 



15 



20 



orf 130. pep 



orfl30a 



orf 130. pep 
orfl30a 



orf 130. pep 
orf!30a 



orfl30.pep 
orfl30a 



10 2C 30 

LKECRLKDPV FI PN I V YKN IAITFLLLHAA 
! I I I 1 1 1 1 I I I I 1 1 * 1 1 1 1 M 1 1 1 K 1 1 1 1 1 
LNLLRAQVHLHMAAVMFVSVRVSILI£AEAU<ECRLKDP 

140 150 160 170 18C 190 

40 50 60 70 80 90 

AELWLPAQT AGFTALAVG FI LLAKLRELHHHELLRKHYVRT YYLLQLFAAAGS LWTGAAX 
M M M I I 1 I t I i : 11 I I M I I I r I I 1 t t 1 I M I I I I 1 I M t M I I I M i » I MM IJ 
AELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 

220 230 240 250 



200 



210 



100 110 120 130 140 150 

LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 

I II I I I I 1 I | I | I | { | | | I |: I I 1 1 I I I I I I I I I M M I I 1 I I I I I I I I I M I I I I i II I 
LONLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 

260 270 280 290 300 310 

160 170 180 190 

FLXNVNPXFFITVPAILTAAVFVLYLFXFI PI FRANAFTDDPEX 
I Mil M!IIIIMIIMMMI::I:MMIIMIMMI 
VLMNVNP I FFITVPAILTAAVFVLYLLTFV PI FRANAFTDDPEX 

320 330 340 350 



The complete length ORF130a nucleotide sequence <SEQ ID 849> is: 
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30 



35 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGCGGCCGT 

GGTGTTTTTC 

TGGAACTTAT 

TTGGACTGGA 

GGCGGCATTA 

CTGCCTCGTT 

GCCCGGCTGA 

GTTACTTGCC 

ATTTGAACCT 

TTCGTATCCG 

ATGCCGTCTG 

TCGCCATTAC 

GCGCAAACCG 

CAAGCTGCGT 

GCACTTATTA 

GGCGCGGCGA 

TACCCTCGGT 

GACTGTGGCA 

ATCGCCGTCC 

GAACGTAAAC 

CCGTGTTCGT 

GCGTTTACAG 



TTTTCGTCGG 

ATCAACCCCG 

GCTGCCGGCG 

CGGGTTTTTC 

TTGCTCGCCG 

TTTCGTCGCC 

TTTGGCTAGA 

GCGTTCACTG 

GTTGCGCGCG 

TGCGCGTCAG 

AAAGACCCAG 

CTTCCTGCTC 

CCGGTTTTAC 

GAGCTTCACC 

CCTGCTCCAA 

AATTACAAAA 

GGCATGATGG 

CAGCGGCTTT 

CCATCCTNTT 

CCGATATTCT 

GCTTTACCTG 

ACGATCCGGA 



CGCGGCGGTG 

GTGCCATCGT 

GCATACGGCG 

GGGTAACCTG 

CATCCGCTAT 

GCCTATTGGC 

CCGAAACACC 

TTTTTCAGAC 

CAAGTGCATC 

TATTCTTTTG 

TATTCATCCC 

CTGCACGCCG 

CTCGCTCGCC 

ATCACGAACT 

CTCTTTGCCG 

CCTGCCCGCC 

GCAGCGTGAT 

ACCAAGCTCG 

CGCCGCCGCC 

TCATCACCGT 

CTGACATTCG 

ATAA 



CTTGCCATAC 

CCTGCACCGC 

GTTTTTTGAC 

AAACCTGTCG 

ACTGCCCTTT 

TGGTGTTGCT 

GACAACTTCG 

GGCATATGCC 

TAAATATGGC 

GGCGCGGAAG 

CAATGTCGTC 

CCGCCGAACT 

GTCGGCTTTA 

CCTGCGCAAA 

CCGCAGGCTA 

TCCGCGCCCC 

GATGGTGTGG 

ACTACCCGAA 

GTTTCGCGCG 

CCCCGCAATT 

TACCGATCTT 



TCGGTGCGCT 

CAAATTTTCT 

TGCGGCTTTG 

CGACTTTGAT 

TCGCCGCAAA 

GCTGTTCTGC 

CCCTGCTAAT 

GTCAGCGGCG 

GGCGGTGATG 

CCCTGAAAGA 

TATAAAAACA 

TTGGCTGCCT 

TCCTGCTTGC 

CACTACGTCC 

TTTGTGGACA 

TGCACCTGAT 

CTGACTGCCG 

ACTCTGCCGC 

CTGTTTTAAT 

CTGACCGCCG 

TCGGGCGAAC 
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This encodes a protein having amino acid sequence <SEQ ED 850>: 

1 MRPFFVGAAV LAILGALVFF IKPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPOTASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLNMAAVM 

151 FVS VRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLMNVN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 



60 



65 



orf 130a. pep 
orfl30-l 
orf 130a. pep 
orfl30-l 
orf 130a. pep 



MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
M I I M M I I i MMI I I M M I I M I M I I M I I M I I I I I MM I M I I I i M M M I 
MRPFTVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

KPVAT LMAALLLAASAI LPFS PQT AS FFVAAYWLVLLLFCARLIWLDRNT DN FALLML.LA 

I | I M I I M 1 M M 1 1 M M I M II M M M II M M I M I 1 1 M I M II I I M M I M I 
KPVATLMAALLLAASAILPFS PQTASFFVAAYWLVLLLFCARLIWLDRNTDN FALLMLLA 

AFTV FQT AYAVSGDLN LLRAQVHLNMAAVMFVS VRV S I LLGAEALKECRLKDP VFI PNW 



BNSOOCID <WO_.99P457BA2J_> 
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15 



WO 99/24578 

orfl30-l 
orf 130a. pep 
orfl30-l 
orf 130a. pep 
orfl30-l 
orf 130a. pep 
orf!30-l 
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1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I M I M 1 1 1 1 1 1 M H I 1 1 M I IN I \\ M : I 
AFTVFQTAYAVSGDLNUJttQVHLNMAA^ 

YKNIAITFLLUiAAAELWLPAQTAGFTSIAVGFILUU^^ 

||lllllllllllll!lllllll!lM:IMIMIIIl||MttilHIII '"I 

YKNIAITFLLLHAAAELWLPAQTAGFTAIAVGFILIAKL 

LFAAAGYXWTGAAKI^LPASAPLHLITI^MMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
||MliillllMHlltlllllll>ilMIMI:lil|||{illllllllitllllLtl 
LFAAAGYLWTGAAKLQNLPASAPI^LITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

IAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPE 
1 I I I I I I t 1 1 I I I I I f | 1 I I I I I I I 1 I 1 I 1 I 1 1 I I I I t I : I 1 - t I I 1 I I I M I I I I 
IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRAKAFTDDPE 



Homology with a predicted ORF from N^onorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
N. gonorrhoeae: 
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orfl30.pep 
orf 130ng 
orf 130. pep 
orfl30ng 
orfl30.pep 
orfl30ng 
orfl30.pep 
orfl30ng 



LKECRLKDPVFIPNIVYKNIAITFLLLHAA 30 
I I t I I I III MM llllll III III 
LNLLRAQVHLNMAAVMEVSVRVSVLLGTETLKECRLKDPVFI PNVIYKN IAIT- LLLHAA 201 

AELWLPAQTAGFTALAVGFIL1AKLRELHHKELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 

I t t I I I 1 1 I 1 I I 1 I I I I 1 I t I I 1 I 1 I I f 1 I I 1 f I 1 1 I I I I I I I I I I I 1 I I I I MIMI 
AELWL PAQT AG FT ALAVG FI LLAKLRE LHHHE LLRKH YVRT Y Y LLQLFAAAG YLWT GAAK 261 

LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 

I 1 I I I t I i I 1 I I I I t I I 1 I I I I I I I 1 1 I I I I t 1 I I 1 I I I t I I i I t I I I I 111:11111 
LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPE 193 

I I I I I llllll I 1 1 I II I : I I I : : I : II II I I I I I I II I 
VLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPE 364 



An ORF130ng nucleotide sequence <SEQ ID 851 > was predicted to encode a protein having amino 



35 acid sequence <SEQ ID 852>: 



40 



1 MNKFFTHPMR PFFVGA AVLA 

51 RRFFDYRFVG PDGFFRQPET 

101 LAGVAAVLRL ADLARRQHRT 

151 HLNMAAVMFV SVRVSVLL GT 

201 AAELWLPAQ T AG FT ALAVG F 

251 AAGYLWTGAA KLQNLPASAP 

301 DYPKLC RIAV SILFASAVSR 

351 VPIFRANAFT DDPE* 



ILGALVFFHQ 
CRYFDGGWA 



PRRYHPAPPN 
CCGCFIAVFT 



LRSVDVTAAF 
ETLKECRLKD 
I LLAKLRE LH 



TVFQTAYAVS 
PVFIPNVIYK 



FLGTYAAGCI 
ATCRI FRRRL 
GDLNLLRAQV 
NIAITLLLHA 



LHLITLGGMT 
AVLMNVNPIF 



HHELLRKHYV 
GGVMMVWLTA 
FITVPEILTA 



RTYYLLQLFA 
GLWHSGFTKL 
AVFMLYLLTF 



Further work revealed the following gonococcal DNA sequence <SEQ ED 853>: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
601 
851 



ATGCGCCCGT 
GGTGTTTTTT 
TGGAACTTAT 
TTGGACCGGA 
GGCGGTGTTG 
TTGCCGCATT 
GCCTGGCTGA 
GTTACTTGCC 
ATTTGAACTT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATCAC 
CAAACCGCCG 
GCTGCGCGAA 
CTTATTACCT 
GCGGCGAAAC 
CCTCGGCGGC 
TGTGGCACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCT 
CGGGTTTTTC 
TTGCTTGTTG 
TTTCGTCGCC 
TTTGGCTCGA 
GCATTTACCG 
ACTGCGCGCG 
TCCGCGTCAG 
AAAGACCCCG 
CCTGCTGCTG 
GTTTTACTGC 
CTGCACCATC 
GCTCCAGCTC 
TGCAAAACCT 
ATGACGGGTG 
CGGCTTTACC 



TGCGGCAGTA 
GCGCTATCAT 
GCATACGGCG 
AGGCAACCTG 
CGGCTGTTTT 
GCCTATTGGC 
CCGCAACACC 
TTTTTCAGAC 
CAAGTGCATT 
CGTCCTTTTG 
TATTCATCCC 
CACGCCGCCG 
GCTTGCCGTC 
ACGAACTCTT 
TTTGCCGCCG 
GCCCGCCTCC 
GCGTGATGAT 
AAACTCGACT 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGCCG 
ATTGCCGTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCCTATGCC 
TGAATATGGC 
GGCACGGAAA 
CAACGTTATC 
CCGAACTTTG 
GGCTTCATCC 
ACGCAAACAC 
CAGGTTATCT 
GCGCCCCTGC 
GGTGTGGCTG 
ACCCGAAACT 



TCGGTGCGTT 
CAAATTTTCT 
TACCGCTTTG 
CTACTTTGAT 
TTACCGCAAC 
GCTGTTCTGC 
CTCTGTTGAT 
GTCAGCGGCG 
GGCGGTCATG 
CCCTGAAAGA 
TATAAAAACA 
GCTGCCCGCG 
TGCTCGCCAA 
TACGTCCGCA 
GTGGACAGGC 
ACCTGATTAC 
ACTGCCGGAC 
CTGCCGCATC 
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901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

5 This corresponds to the amino acid sequence <SEQ ID 854; ORF 1 30ng- 1 >: 



10 



i 

51 
101 
151 
201 
251 
301 
351 



MRP FFVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 
LDRTGFSGNL KPAATLMAVL LLVAAVLLPF LPQ1AAFFVA AYWLVLLLFC 



AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA 
FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL 
Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL 
AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT 
AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL 
FTDDPE* 



QVH LNMAAVM 
HAAAELWLPA 
FAAAGYLWTG 
KLDYPKLCRI 
TFVPIFRANA 



ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 
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or f 130-1 . peo MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I I I I I 1 | | I | I I II I I I I I 1 I I I I I I : II I I I I I I I I I II I I I I II I : I II I I I I I I I I 
orfl30ng-l MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1 . pep KPVAT124AALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

I | : I I I I I : I I I : I : : : I I I II I : II I i I II I I II I I II I II I I I I I II I I II I I I I 
orfl30ng-l KPAAT LMAVLLLVAAVLL PFLPQLAAFFVAAYWLVL L L FCAW L I WLORNT DN FALLMLLA 

0^*130-1 . pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

I I I I I I i I 1 I M I I i I 1 1 I I I I I I I I I I I I I 1 I i 1 I I : I I I : I : I I It I I I I I I I I I I : : 
Orfl30ng-1 AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPKVI 

orf 130-1 .pep YKN I AI T FLLLHAAAE LW L P AQT AG FT ALA VG F I LLAKLRE L HH HELLRKHY VRT YY LLQ 

limn 1 1 1 1 si 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 

orfl30ng-l YKNIAIT-lXUiAAAELWLPAQTAGETALAVGFILlJU<LRELHHHELIJlKHYVRTYYL^ 

orf 130-1 . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
I I 1 1 I 1 I J I I I t J I 1 1 I I 1 I I 1 i I I I I I 1 I 1 I 111 II I Mil II II II III II Mill I 
orfl30ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLpGGMTGGVMMVWLTAGLWHSG ftkldypklcr 

orf 130-1 .pep IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 
til I I I I : I II 1 I I I I I I I I II I I 1 I I I I I I I i: t M : I i : I I I I I I I I I I I I I I 
orfl30ng-l IAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



40 Example 101 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 855>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

45 151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG . . 

50 This corresponds to the amino acid sequence <SEQ ID 856; ORF1 3 1>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 

55 1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 
151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 
201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 
251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 
5 301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 
401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
10 51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 

101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidv : (strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORF13ia) from strain A oiN. 
15 meningitidis: 

10 20 30 40 50 60 

or f 131 pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
I | || | I | | M | I I I I II I I I I I I I I I I I I I I I I : I I t I I I I I I I I I I I I I I I I I I M I I 
orfl31a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
20 10 20 30 40 50 60 

70 80 90 100 110 120 

orfl31.pep YE I PLSDGNSSVRANEYESAQQSY FY RKIGKFEXCGLDWRTRDGKPLIET FKQGGFDCLE 

1 1 1 1 1 1 1 1 1 1 1 1 » 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 urn: 

25 orfl31a YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 



orfl31.pep K 

30 i 

o r f 1 3 1 a KQGLRRHGLSERVRWX 
130 

The complete length ORF131a nucleotide sequence <SEQ ID 859> is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

35 51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

40 301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
45 51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 

101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
1 I I 1 I I 1 1 I I I 1 1 ! I I I I I 1 I I 1 I 1 1 I I I I 1 t I r I I I I t I I I t t 1 I I I I 1 1 1 I I 1 I I I I 
50 orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

or f 1 31a pep YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
I I 1 I I I I I I t I I I ! i I k I 1 t I I I I I I 1 M I I I I I I I t I 1 I I M I 1 K I ! I I I I I Mill: 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



55 



or f 131a . pep KQGLRRKGLSERVRWX 
I I I I I 1 I I I 1 1 1 1 1 1 1 
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orf 131-1 KQGLRRNGLSERVRWX 

Hnmnln py with a predicted ORF from N.zonorrhneae 

ORF13I shows 89.3% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
5 N, gonorrhoeae: 

orfl31 pep MEIRAIKYTAMAALLAETVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

IIICIIMI I ||: I Mill III II II II 11:11111 II II I MM II Mill M I 
orfl31ng MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

10 orfl31.pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

I I ) I | | | | I I I I I II i M I I : I I I S i I I I I I I I I I I M I II II M : I I I I I I I I I I 
orfl31ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 

orfl31.pep K 121 
I 

orfl31ng KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
20 51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 

101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

25 101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtccgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

30 351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 

1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
35 101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 

orf 131ng-l . pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
1111:11111 I IS : M I I I I I I I I 1 I I I I 1 I I : I I I I I M t I I I I I 1 I I I I I I I II I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orfl31ng-l.pep YE I PLSDGNRSVRANE YES AQKSYFYRKIGKFEACGLDWRTRDGKPLVER FKQEGFDCLE 
! | I I I I I I I I M I I I I I I I i I : I I M I I II I I M I I I M I M I M M : t Ml IIMII 
or^!31-l YEI PLSDGNRSVRANE YESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 
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45 orfl31ng-l.pep KQGLRRNGLSERVRWX 

1 1 1 1 1 1 II I I M 1 1 1 I 
orf 131-1 KQGLRRNGLSERVRWX 



Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
50 useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 102 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 865> 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

5 101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

10 351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC.GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

15 601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

751 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

20 1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

25 251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 
51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 
101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 
30 151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 
251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 
301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 
351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 
35 4 oi CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 
501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 
551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 
601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 
40 651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 
751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 
801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 
851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 
45 901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 
1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 
1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 
1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 
50 1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 
1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 
1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 
1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

55 This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 M KHIHIIGIG GTFMGGLAAI AK EAGFEVSG CDAKMYPPMS TQLEALGIDV 

5! YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

60 201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
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351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
401 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with the hypothetical o457 prot on of Rcoli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

Orf 132- 4 IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 

IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
o4 57: 3 IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 

Orf 132- 64 ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY*SGPQWL + VL WVL VAGTHGKTTTA M 
o4 57: 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orfl32: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 

Homoloev with a predicted ORF from K me ningitidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of TV'. 



meningitidis: 



25 



30 



35 



40 



orfl32 .pep 
orfl32a 



orfl32.pep 
orfl32a 

orf 132 .pep 
orfl32a 

orf 132. pep 
orf!32a 



10 20 30 40 50 60 

MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

| 1 1 1 | t I I I Mill I 1:1111111111 I I II I I I I I I I I M I I I 1 1 I IIIMIMIII 
KKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

10 20 30 40 50 60 



70 80 90 100 110 120 

E FKADVYV IGNVAKRGMDWEAI LNLGLPY I SGPQWLSENVLHHHWVLG VAGTHGKTTTA 

1 I I I 1 I I I f I I I 1 I I I 1 1 I I Mill I Ml I I: I! MtM UN I N Mill 

EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 

70 80 90 100 110 120 

130 140 150 160 
S MLAWV LE Y AG LA PG FL I GGVXGK FR RFRPPAANAAPRPEQPI AVFR 

1 | I I H It I II 1 I I I I I I M : I I 2 I s I : : I : 1 I 

SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 

130 140 150 160 170 

170 180 190 200 210 220 

HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

^SKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 



45 The complete length ORF 1 32a nucleotide sequence <SEQ ID 869> is: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGAAACACA 
TGCCGCCATT 
AGATGTATCC 
TATGAAGGCT 
CGTTATCGGC 
TGAACCGTGG 
NTGCTGCACC 
GACCACCGCG 
CGGGCTTCNT 
CTGCCGCAAA 
CATTGAAGCC 
TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGT 
AGCAAAGCCT 
AAATTCGGCA 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGACACCGC 
AATGTCGCCA 
GCTGCCTTAT 
ATCATTGGNN 
TCTATGCTCG 
TATCGGCGGC 
CGCCGCGCCA 
GACGAATACG 
CCGTCCGCGT 
TCTTCGCCGA 
ACCGTGCCGT 
GCAAGACACT 
CGGAACACGG 



CGGTATCGGC 
CAGGGTTTGA 
ACCCAGCTCG 
GCAGTTGGAC 
AGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CGTGGGTTTT 
GTACCGGAAA 
AGACCCGAAC 
ACACCGCGTT 
ACCGCCGTGT 
TTTGGGCGCG 
CTGAAGGCCT 
TTGGACAAAG 
CTGGCAGGCC 



GGCACGTTTA 
ANTCAGCGGT 
AAGCCTTGGG 
GAATTTAAAG 
GGATGTGGTT 
CGCAATGGCT 
GCGGNGACGC 
GGAATATGCC 
ACTTCAGCGT 
AGCCAATCGC 
TTTCGACAAA 
TGAACAATCT 
ATACAGACCC 
CATCGTCTGC 
GCTGCTGGAC 
GGCGAAGCCA 



TGGGTGGGAT 
TGCGATGCGA 
CATAGGCGTG 
CCGACGTTTA 
GAAGCGATTT 
GGCTGAAAAC 
ACGGCAAAAC 
GGACTCGCAC 
TTCCGCCCGC 
CGTTTTTCGT 
CGCTCCAAAT 
GGAATTCGAC 
AGTTCCACCA 
AACGGACGGC 
GCCGGTGGAA 
ATGCCGATGG 
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801 CTCGTTCGAC GTGTTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCTTGGA 

851 GTTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCNGT CATCGCCGCC 

901 GCGCGTCATG CCGGAGTNGA CATTCAGACG GCCTGCGAAG CCTTGAGCAC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGTA 

5 1001 TCACCGTTTA CGACGACTTC GCCCACCATC CGACCGCTAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAGCG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAATACGA TGAAGCTGGG TACGATGAAA GCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGNTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 

10 1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 CAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This encodes a protein having amino acid sequence <SEQ ID 870>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFEXSG CDAKMYPPMS TQLEALGIGV 

15 51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGRQOSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAVIAA 

20 301 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

401 DWDVAEALAP LGGRLHVGKD FDAFVAEIVK NAEAGDHILV MSNGGFGGIH 

451 TKLLDALR* 

ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 

25 or f 132a. pep MKHIHIIGIGGTFWGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

1 I i I I I I I I I I 1 1 I I 1 : 1 1 1 ! 1 I I I 1 ! I 1 I 1 1 I I I t I I 1 1 1 t 1 I I t I UIMMIIM 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132d . pep E FKADVYVI GN VAKRGMD WEAI LNRGLPY I SG PQWLAENXLHHHWXLGVAXTHGKTTTA 
30 * M I M I I I I I 1 I I I I I I I I I M I I I I I M I 1 I 1 il t : I I I It I I WW I I I I I I I 1 

orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132a . pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
1MIII! MINIMI IIMMICIIMMMIIMIIM.MIMMIIMIIMIII 
35 orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
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orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 

I I II 1 I I I I II II I I M II I I II I II M I I I : I II II I M W I I I M I I II M I I 

orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

or f 1 32a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHKRMNALAVIAA 

I I I t I 1 1 I I t I 1 I I 1 1 t 1 I 1 I I I I I I I I t I I I I 1 I 1 11:1 Milt I M I Ml 

orfl32-l LDKGCWT P VEKFGTEHGWQAGEANADG S FDVLLDGKT AGRVKW DLMGRHNRMN ALAV I AA 



45 orf 132a .pep ARHAGVD I QTACEALSTFKNVKRRMEI KGT ANG ITVYDDFAHH PT AIETT I QGLRQRVGG 

M I : I II M II M I t : : M M M I I I I II M M M M II I I M I II M I! M I It I I I t I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRME IKGTANG ITVYDDFAHH WAIETT IQGLRQRVGG 

or f 1 32a . pep ARIIAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
50 I I I I I I I II I M I I I I I I I I : I I I : M M I I tl I I I 1 1 : 1 1 I I I I I I I I I 1 1 1 1 • M 1 1 

orfl32-l ARILAVLEPRSNTMKLGTMKS ALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132a. pep FDAFVAEIVKNAEAGDHILVMSNGGFGGIHTKLLDALRX 
I I I I II M I M II: I M M II I I It II II I llhllll 
55 orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

60 orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 

I I I I i I I M I M II I I : I I M I M M : I I M I I t M M M M II M M MMMIIII: 
orfl32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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EFKADVWIGNVAKRGMDVVEAII^ 

. , : | | : i i t I n i • I i I i I I I l t I t M I I I I I I I I I : I I I I M I M I M I I M I I I I ! i 
E FQADI YVIGNVARRGMDWEAI LNRGLPYI SGPQWLAENVLHHHWVLGVAGTHGKTTTA 

SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 

1 | | ] | | | | | | 1 1 1 1 f I I 1 i 1 t |IIIIHII:IIII IMi IlilllMMIIiillll 
SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 

T FX I RALPS A YRRVE OS G I R PRRHLCRLGR DT D PV P L PRA YRAVXRLNR LQRT AAKP AR Y 
| : III) | | | | 1 1| | | 1 I I II II 1 t I I 1 I I t 1 I I I I : t : : I : I 1 I I I I ! M I I I 
TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 

FGQRLLDAGGKIRHGTRLA 259 
I I I I I I II I II I I I MM 
FGQRLLDAGGK I RHRTRLADW 261 



120 
120 
180 
180 
240 
240 



An ORF1 32ng nucleotide sequence <SEQ ID 871> was predicted to encode a protein having amino 



acid sequence <SEQ ID 872>: 



1 MKHIHIIGIG GTFMGGIAAI 
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51 HEGFDAAQLE 

101 VLHHHWVLGV 

151 PTANAASRPE 

201 PRRHLRRLGR 

251 KIRHRTRLAD 



EFQADIYVIG 
AGTHGKTTTA 
QQIAVFRHRS 
DTDPVPPPRA 
W* 



AKEAGFKVSG 
NVARRGMDW 
SMLAWVLEYA 
RRIRHRLFRQ 
HRTIRRPHRL 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFLIGG 
TLQIRALSPA 
QRTAAKPARY 



TQLEALGIGV 
ISGPQWLAEN 
VPGKFRRFRP 
YRRVEQSGIR 
FGQRLLDAGG 



Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 
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30 



35 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGAAACACA 
TGCCGCCATT 
AGATGTATCC 
CACGAAGGCT 
CGTCATCGGC 
TGAACCGTGG 
GTGCtgcacc 
gaccaCcGcg 
CGGGCTTCCT 
CTACCGCAAA 
CATCGAAGCC 
TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGC 
AGCAAAGCCT 
AAATTCGGCA 
CTCGTTCGAC 
ATTTGATGGG 
GCACGCCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAGCCGCGT 
CCGCAAGCCT 
GACTGGGACG 
CGGTAAAGAT 
CCGGCGACCA 
ACCAAACTGC 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
GCTGCCTTAT 
atcaTTGGgt 
tCCATGCTCG 
CATCGGCGGt 
CGCCGCGTCA 
GACGAATACG 
TCGCCCGCGT 
TCTTCGCCGA 
ACCGTACCAT 
GCAAGATACT 
CCGGACACGG 
GTATTGCTTG 
CGGACACAAC 
CCGGAGTCGA 
GTCAAACGCC 
CGACGATTTC 
TGCGCCAACG 
TCCAACACCA 
CAAAGAAGCC 
TTGCCGAAGC 
TTCGATACCT 
TATTTTGGTG 
TGGACGCTTT 



CGGTATCGGC 
CCGGGTTCAA 
ACCCAGCTCG 
GCAGTTGGAA 
GGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CCTGGGTCTT 
gtaccggaAA 
AGACCCGAAC 
ACACCGCCTT 
ACCGCCGTGT 
CTTGGGCGCG 
CCGAAGGCCT 
TTGGACAAAG 
CTGGCAGATT 
ACGGCAAAAA 
CGCATGAACG 
TGTTCAGACG 
GCATGGAAAT 
GCCCACCACC 
TGTCGGCGGC 
TGAAACTCGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTCAAG 
GGATGTGGTC 
CGCAATGGCT 
GcagggaCGC 
GGAATATGCC 
ATTTCGGCGT 
AGCAAATCGC 
TTTCGACAAA 
TGAACAATCT 
ATACAGACCC 
CATCGTCTGC 
GCTGCTGGAC 
GGTGAAGTCA 
AGCCGGACAC 
CGCTCGCCGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCCAT 
GCGCGCATCC 
CACGATGAAG 
TCTGCTACGC 
TTGGGCTGCA 
AATTGTGAAA 
GCGGTTTCGG 



TGGGCGGGAT 
TGCGACGCGA 
CATAGGCGTA 
CCGATATTTA 
GAGGCGATTT 
GGCTGAAAac 
ACGGcaaAac 
GGACTCGCGC 
TTCCGCCCGC 
CGTTTTTCGT 
CGCTCCAAAT 
GGAATTCGAC 
AGTTCCACCA 
AACGGACAGC 
GCCGGTGGAA 
ATGCCGACGG 
GTCGCATGGG 
CATCGCTGCC 
CCTTGGGTGC 
GCAAACGGCA 
CGAAACCACG 
TCGCCGTCCT 
TCCGCCCTGC 
CGGCGGCGCG 
GGCTGCGCGT 
AACGCCCGAA 
CGGAATACAC 



55 



60 



1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 

1 M KHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HAD I FAD LG A IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 

451 TKLLDALR* 
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ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 



orf 132ng-l .pep MKHIHIIGIGCTEMGGIAAIAKBAGFKVSGCnAKMYPP^ 

orn^ng v 1 1 | || I I M I II I M I : M I H I M I M I II 1 1 l| M II 1 1 I 1 1 I I I M I II 1 1 1 I « 
orf 132-1 MKHIHIIGIGGTFMCMU^AKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orfl32ng-l.pep 



EFQADI YVIGNVARRGMDWEAI LNRGLPYI SGPQWLAENV1*HHHWVLGVAGTHGKTTTA 
| | : I I : I 1 I I | 1 1 : 1 1 | | | I I I I II II II 1 1 II I M M I M M I M M I II II II M II 
orf 132-1 EFKADVYVIGNVAKRGMDVVEAI LNLGLPYI SGPQWLSENVLHHHWVLGVAGTHGKTTTA 

10 orf 132no-l . pep SMIAWVI^YAGIAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 

1U orti^ng P P 1 1 | | | j | | 1 1 | | j 1 1 | j | | | j 1 | M I I I I 1 1 1 1 | I 1 : 1 1 I I I 1 1 1 1 III MM I I 

orf 132-1 SMIAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orfl32nq-l.pep RSKFVHYRPRTAVl^NLEFDHADIFADI^IQTQFHHLVRTVPSEGLIVCNGQC^SLQDT 
15 9 P 1 1 1 1 | | M M 1 1 M M II I I I M II I II M II 1 1 1 1 M II I M II M II II I : I M II I I 

orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orfl32na-l pep LDKGCWT PVEKFGTGHGWQIGEVNADGS FDVLLDGKKAGHVAWDLMGGHNRMNALAV I AA 
* P I M M I I 1 I 1 1 I M III! IIMIIMIIIIIMI MM Mill Ml III I I II I I 
20 orfl32-l LDKGCWT PVEKFGTEHGWQAGEANADG S FDVLLDGKTAGRVKWDLMGRHNRMN ALAVI AA 

orf 132na-l .pep ARHAGVDVQTACEALGAFKNVKRRMEI KGTANGITVYDDFAHHPTAIETT IQGLRQRVGG 
IM:MI:IIIIIIIIIIMIMMIIIIIIIMIMMIIIIMIIIIIIIMmiM 
orf 132-1 ARHVGVDIQTACEALC^FKNWRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 



25 



orf 132nq-l . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
9 1 1 M I I II M II M II M M I I M M II II 1 M M M M M II I M II M M M MM 

orf 132-1 AR I LAV LE PRSNTMKLGTMKS AL PVS LKEADQVFCYAGGVDWD VAEALAPLGGRLN VGKD 



30 orfl32ng-l.pep FDT FVAE I VKNARTG DH I LVMSNGG FGGI HTKLLDALRX 

I I : \ I II M I II : M II I I M I 1 M I I M I I II M I II 
orfl32-l FDAFVAE I VKNAEVGDHI LVMSNGG FGGI HGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 

pir||S56459 hypothetical protein o457 - Escherichia coli >gil537075 (U14003) 
35 ORF o4S7 lEscherichia coli] >gil!790680 (AE000494) hypothetical 48.5 kD protein 

in fbp-pmba intergenic region (Escherichia coli] Length - 457 
Score - 474 bits (1207), Expect «= e-133 

Identities - 249/439 (56%), Positives = 294/439 (66%), Gaps = 13/439 (2%) 

40 Query: 22 KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 

++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
SbjCt: 21 RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 79 

Query 82 AILNRGLPYISGPQWIAEOTLHHHWVLGVAGTHGKTTTASMLAWVLEYAGLAPGFLIGGV 141 
45 a+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 

Sbjct: 80 AVLEKNIPYMSGPQWLHDFVUIDRWV1AVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 

Query 142 PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 201 
P NF VSA L +S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 

50 SbjCt: 140 PGNFEVSAHL GESDFFVIEADEYDCAFFDKRSKFVHYCPRTLILNNLEFDH 190 

Query: 202 ADIFADLGAIQTQFKHLVRTVPSEGLIVCNGQQQSLQDTLDKGCWTPVEKFGTGHGWQIG 261 

ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 

Sbjct: 191 ADIFDDLKAIQKQFHHLVRIVPGQGRIIWPENDINLKQTMAMGCWSEQELVGEQGHWQAK 250 

Query 262 EVNADGS^FDVLLDGKKAGHVAWDI^GGHNRMNAIJVVIAAARHAGVDVQTACEALGAFKN 320 

++ OS ++VLLDG+K G V W L+G HN N L IAAARH GV A ALG+F N 
SbjCt: 251 KLTTDASEWEVLLDGEKVGEVKWSLVGEHNMHNGLMAIAAARHVGVAPADAANAIjGSFIN 310 

60 Query: 321 VKRRMEIKGTANGITWDDFAHHPTAIETTIQGIJ^QRVGG-ARI1AV1£PRSNTMKLGTM 379 

+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI + AVLE PRSNTMK+G 
Sbjct: 311 7VRRRLELRGEANGVTVYDDFAHHPTAILATLAALRGKVGGTARIIAVLEPRSNTMKMGIC 370 

Query: 380 KSALPASLKEADQVF-CYAGGADWDVAEAIAPIX3CR1J^VGKDFDTFVAEIVKRARTGDHI 438 
65 K L SL AD+VF W VAE D DT +VK A+ GDHI 

Sbjct: 371 KDDlAPSLGRADEVFLliQPAHIPWQVAEVAEACVQPAHWSGDVDTlADMVVKTAQPGDHI 430 

Query: 439 LVMSNGG FGG I HTKLLDAL 457 
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LVMSNGGFGGIH KLLD L 
Sbjct: 431 LVMSN GG FGG I HQKLL DG L 449 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELIS A (positive result). These 

10 experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 875> 

1 CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 "cTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

15 10 1 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

90 351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

4 01 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

4 51 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

10 851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

90 n TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC.GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

35 noi ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 PGYrGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

40 101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

V 15 1 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

45 35i KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

50 151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



TCATCTCAAT 
TGTCGTCAAA 
GTTCGGCGAA 
AATACCTACG 
AGGTAATGCG 
CATCTGTCGG 
TACCGCGTGG 
TTTGGAACGG 
TCAATTCCGA 
AAATACAAGC 
CGAAGAGCAT 
TTACCCCCAT 
TTTAAATTGG 
CGATTTAAAC 
AGTTCAATTA 
GCAGCCTACA 
AGGCTGGGGG 
TCGACCTCAA 
CAAACCACTT 
CTTTCCTGAA 
GGCTTTATTC 
CAAAAATCAA 
CTACTTCGAT 
CCAATACCGT 
TCGGATGACG 
GAAACATTGC 
ACGGCAAAAA 
GGCGATTATT 
CAACATCCAA 
CCGCCTTAAA 
TATAAAAAAG 
CGGCTACCGC 
GGTGGGATTT 
GCCTACACCA 
TTTTGAGTTG 
CTTACGCCTA 
GAATCGCCCA 
GTTGAGCAGG 
GTACGCGCTG 
TTCGGCAAGA 
CAACGGGGGA 
AACAAACCGA 
GCTTACGAGC 
GTTCGACAGG 
CGCAGCGTTA 
ACGTGTAATG 
CAAAAGCGTA 
TGAGCTACAA 



TCGGTGCATC 
GGCAGCTTCA 
TCTGCGGACT 
GCCTGCTGCT 
ATGGCGGCGA 
TGTGCTTTAC 
GCGGCGGCGG 
CGCAAGCAGC 
CAGCGGAAAA 
CGTATAAAAA 
GACAAAAGCT 
CGATCCGTCC 
AATACGACGG 
ACCAAAATCG 
CGGTTTGTCT 
ATTCGGGCAG 
CTTTTAAAGG 
CAACACCGCC 
TGGGCTTCAA 
GAATTGGGGC 
CTATTTGGGG 
CCATTGTCCA 
GCCGCGCTCA 
CGGCTACCGT 
AATTTAAGCG 
AACCGGAGCT 
GCGCGCCAAC 
TCATGCCGTT 
GAAATGTATT 
ACCAGAGCGC 
GATTGTTAAA 
AGCCGCATCG 
GAACGGGGAT 
TCCAACATCG 
GAGCTGAATT 
TCAAAAAAGC 
ACAATGCGTC 
GTTTCCGCCC 
GTTGGGCAAC 
GCATCCGCGC 
AATACCAGCA 
AACTCTTGCC 
CGAAGAAAAA 
CGTTATATCG 
TTACAGCTCG 
CTGATAAAAC 
TTGACCAATT 
GTTTTAA 



TGTCGACAGC 
GCGGCTCGGC 
TTAGGCGTGG 
AAAAGGTCTG 
TAGGTGCGCG 
GGGCACAGCA 
GCAGCACATC 
GATATTTTGT 
TGGGAGCGGG 
TTACAACAAC 
GGCGGGAAAA 
AGCCTGAAGC 
CGTATTCAAT 
GCAGCCGCAA 
TTGAACCCGT 
GCAGAAATAT 
ATTTTGAAAC 
ACCTTCCGGC 
TTATTTCCAC 
TGTTTTTCGA 
CGGTTTAAGG 
ACCGGCCGGC 
AAAAAGACAT 
TTCGGCGGCG 
GGCATTCGGA 
GCGGGATTTA 
AACCATTCGG 
CGCCAGCTAT 
TTTCCCAAAT 
GCAAACACTT 
ACAAGATGAT 
ACAACTACAT 
ATTCCGAGCT 
CAATTTCAAA 
ACGATTATGG 
ACGCAACCGA 
CAAAGAAGAC 
TGCCGCGAGA 
AAACTGACTT 
GACGGCTGAA 
ATTTCCGGCA 
CGCCAGCCTT 
CCTTATTTTC 
ATCCGCTCGA 
TTCGACCCGA 
GTTGTGCAAC 
TTGCACGCGG 



AATTTTATTG 
AGGCATCAAC 
ATGACGTCGT 
ACCGGCACCA 
CAAATGGCTG 
GGCGCAGCGT 
GGAAATTTTG 
ACAAGAGGGT 
ATTTACAAAG 
CAAGAACTAC 
CCTg . CaCCG 
AGCAGTCGGC 
AAATACACGG 
AATCATCAAC 
ATACCAACCT 
CCGAAAGGGT 
CTACAACAAC 
TGCCCCGCGA 
AACGAATACG 
CGGTCCTGAT 
GCGATAAAGG 
AGCCAATATT 
TTACCGCTTA 
AATATACGGG 
GAAAACTCGC 
TGAACCCGTA 
TCAGCATTAG 
TCGCGCACAC 
CGGCGACTCC 
GGCAATTTGG 
ACATTAGGAT 
CCACAACGTT 
GGGTCAGCAG 
GACAAAGTGC 
GCGTTTTTTC 
CCAACTTCAG 
CAACTCAAAC 
TTACGGACGT 
TGGGCGGCGC 
GAACGCTATA 
ACTGGGCAAG 
TGATTTTTGA 
CGCGCCGAAG 
TGCGGGCAAT 
AAGACAAGGA 
GGCAAATACG 
ACGCACCTTT 



CCGGACTGGA 
AGCCTTGCCG 
TCAGGGCAAT 
ATTCAACCAA 
GAAAGCGGAG 
GGCGCAAAAT 
GCGCGGAATA 
GCTTTGAAAT 
GCAACAGTGG 
AaAAATACAT 
CAATACGACA 
AGGCAATCTG 
CGCAATTTCG 
CGCAATTATC 
CAATCTGACC 
CGAAGTTTAC 
GCGAAAATCC 
AACCGAGTTG 
GCAAAAACCG 
CAGGACAACG 
GCTGCTGCCC 
TCAACACGTT 
AACTACAGCA 
CTATTACGGC 
CGACATACAA 
TTGAAAAAAT 
TGCGGACTTC 
ACCGTATGCC 
GGCGTTCACA 
CTTCAATACC 
TAAAACTGGT 
TACGGGAAAT 
CACCGGGCTT 
ACAAACACGG 
ACCAACCTTT 
CGATGCGAGC 
AAGGTTATGG 
TTGGAAGTCG 
GATGCGCTAT 
TCGACGGCAC 
CGTTCCATCA 
TTTTTACGCC 
TCAAAAATCT 
GATGCGGCAA 
CGAAGACGTA 
GCGGCACAAG 
TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



EAQIQVLEDV 
PGAFTQQDKS 
SSQFGASVDS 
NTYGLLLKGL 
YRVGGGGQHI 
KYKPYKNYNN 
FKLEYDGVFN 
AAYNSGRQKY 
QTTLGFNYFH 
QKSTIVQPAG 
SDDEFKRAFG 
GDYFMPFASY 
YKKGLLKQDD 
AYTIQHRNFK 
ESPNNASKED 
FGKS IRATAE 
AYEPKKNLIF 
TCNADKTLCN 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
VSALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 

VDGITQTFYS 

SLAGSANLRT 

ESGASVGVLY 

ALKFNSDSGK 

QYDITPIDPS 

RNYQFNYGLS 

AKILDLNNTA 

QDNGLYSYLG 

NYSTNTVGYR 

LKKYGKKRAN 

GVHTALKPER 

YGKWWDLNGD 

TNLSYAYQKS 

LEVGTRWLGN 

RSIKQTETLA 

DAATQRYYSS 

LMTMSYKF* 



ENLDNIVRSI 

TSTDAGRAGG 

LGVDDWQGN 

GHSRRSVAQN 

WERDLQRQQW 

SLKQQSAGNL 

LNPYTNLNLT 

TFRLPRETEL 

RFKGDKGLLP 

FGGEYTGYYG 

NHSVSISADF 

ANTWQFGFNT 

IPSWVSSTGL 

TQPTNFSDAS 

KLTLGGAMRY 

RQPLIFDFYA 

FDPKDKDEDV 



Computer analysis of this amino acid sequence gave the following results: 



PCT/IB98/01665 

WO 99/24578 

-475- 

Homology with with the probable lonB^gBcndent recg etor m2 lMILMuenzae (acce ssion number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVH7A 90 

I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNIQEM+FSQ+ ++GV+TA 
INEPILHKSGHKKAFNHSATLSAELSDYFMPFETYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

LKPERANTWQFGFXTYKKGLUCQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV-15C 

LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 
LKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYIHNVYGVWW RDGMPTWA 680 

SSTGLAYTIQHRXFXDKVHXXXXXXXXXYDYGRFFTKLSYAYQKSTQPTNFSDASESPNN 210 

S G YTI K+ 4 V YD GRFF N+SYAYQ++ QPTN++DAS PNN 

ESNGFKYTIAHQNYKPIVKKSGVELEIN YDMGRFFANVSYAYQRTNQPTKYADASPRPNN 740 

ASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYID 270 
AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 
ASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLAARYYGKSKRATIEEEYIN 800 

GTNGGNTSNFRQLGKRSIKOTETLT^RQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDP 330 
20 G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 



LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 
25 HI121: 860 LDAGNDAASQRYYSSL NNSIECAQDSSAC GGSDKTVLYNFARGRTYILSLN 910 



10 



15 



30 











Orf 133 : 




HI121 : 


623 


Orfl33: 


151 


HI121 : 


631 


Orf 133: 


211 


HI121: 


741 


Orfl33: 


271 


HI121: 


801 


Orfl33: 


331 


HI121: 


8 60 


Orf 133: 


391 


HI121: 


911 



YKF 



Homology with a predicted ORF from N. meningitidi s (strain A) 
ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of N. 
meningitidis: 

10 20 30 

orfim oeo PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 

-P P III | | | | MUM 1111:1111 

orf* 33a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 47C 480 490 500 

40 40 50 60 70 80 90 

orf 133 pep YEPVLKKYGKKRANNKSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
UIMMMMMMMMMMUMMMMMMMMMMMMMMMMMI 

orf 133a yepvlkkygkkraknhsvsisadfgdyfmpfasysrthrmpniqemyfsqigdsgvhtal 

510 520 530 540 550 560 

45 

100 110 120 130 140 150 

orf 133 pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 

|| llll 1 1 1 1 1 t I I I I I MINIMUM! MMMMMMMMMMI 

orf 133a KPERANTOQFGFNTYKKGLLKQDDILGLK^ 
50 570 580 590 600 610 620 

160 170 180 190 200 210 

orf 133 pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 

| | | | M M I M I MM: M! I I 1 1 1 1 M I I I I II I I I I I I I I I I 

55 orf 133a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFS DASES PNNA 

630 640 650 660 670 680 

220 230 240 250 260 270 

orf 133 pep SKEDQLKQGYGLSRVSALPRDYGR3-EVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 

60 | M M M M M I M M M M M I M M I M M M M M M M M M I M M M I 

orfl33a SKEDQLKQGYGLSRVSALPRDYGRI^VGTRWl^NKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 

280 290 300 310 320 330 

65 orf 133. pep TOGGNTSNFRQI^KRSIKQTETI^QPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 



BNSDOCID: <WO 9924$7BA2_I_> 



WO 99/24578 
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10 



III | | Mil II I Hi I II II I I III I I I "I III MM II IN I IN II IM I 
orfl33a TNGXXTSNFRQI^KRSIXQTETU^QPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 

750 760 770 780 790 800 

340 350 360 370 380 390 

orf 133 . pep DAGNDAAXERYYSSFDPKDKDXDWCKADKTLCNGKYGGTSKSVLTNFAKGRTFllTrMSY 
I I || |l |:: II III M II I II UNI |:l I I MM Mill || Ml III I MM MM 
orf 133a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 
810 820 830 840 850 860 



15 



orf 133. pep 
orfl33a 



KFX 
I I I 
KFX 
870 



A partial ORF133a nucleotide sequence <SEQ ID 879> is: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 



AAAGACAAAA 
TATATTCAAA 
GTGCGTTTAC 
CGCNGCGACA 
NCANACCTTT 
CTCAATTCGG 
GTCAAAGGCA 
GGCGAATCTG 
CNTACGGCCT 
AATGCGATGG 
TGTCGGTGTG 
GCGTGGGCGG 
GAACGACGCA 
TTCCAACAGC 
CCAAGTGGTA 
GAAGGTCATG 
CACCCCCATC 
TTAAATTGGA 
GATTTAAACA 
ATTCAATTAC 
CAGCCTACAA 
GGCTGGGGGC 
CGACCTCANC 
AAACCACTTT 
TTTCCTGAAG 
GCTTTATTCC 
AAAAATCAAC 
TACTTCGATG 
CAATACCGTC 
CGGATGACGA 
AAACATTGCA 
CGGCAAAAAG 
GCGATTATTT 
AACATCCAAG 
CGCCTTAAAA 
ATAAAAAAGG 
GGCTACCGCA 
GTGGGATTTG 
CCTACACCAT 
TTTGAGTTGG 
TTACGCCTAT 
AATCGCCCAA 
TTGAGCAGGG 
TACGCGCTGG 
TCGGCAAGAG 
AATGGGGNAN 
ACAAACCGAA 
CTTACGAGCC 
TTCGACAGGC 
GCAGCGTTAT 
CGTGTAATGA 
AAAAGCGTAT 
GAGCTACAAG 



AAGTGTTTAC 
TCCANCGAAA 
ACANCAANAT 
GCGGGTTCGG 
TATTCGACTT 
TGCATCTGTC 
GCTTCAGCGG 
CGGACTTTAN 
GCTGCTAAAA 
CGGCGATAGG 
CTTTACGGGC 
CGGCGGGCAG 
AGCAACGATA 
GGAAAATGGG 
TCAAAAATAC 
ATAAAAGCTG 
GATCCGTCCA 
ATACGACGGC 
CCAAAATCGG 
GGTTTGTCTT 
TTCGGGCAGG 
TTTTNAAAGA 
AACACCTCCA 
GGGCTTCAAT 
AATTGGGGCT 
TATTTGGGGC 
CATTGTCCAA 
CCGCGCTCAA 
GGCTACCGTT 
ATTTAAGCGG 
ACCAGAGCTG 
CGCGCCAACA 
CATGCCGTTC 
AAATGTATTT 
CCAGAGCGCG 
ATTGTTAAAA 
GCCGCATCGA 
AACGGGAATA 
CCAACACCGC 
AGCTGAATTA 
CAAAAAAGCA 
CAATGCGTCC 
TTTCCGCCCT 
TTGGGCAACA 
CATCCGCGCG 
NTACCAGCAA 
ACCCTTGCCC 
GAAGAAAAAN 
GTTATATCGA 
TACAGTTCGT 
TGATAACACG 
TGACCAATTT 
TTTTAA 



CGATGCGCGT 
ACCTCGACAA 
AAAAGCTCGG 
GCGGGTCAAT 
CTACCGATGC 
GACAGCAATT 
CTCGGCAGGC 
GCGTGGATGA 
GGTCTGACCG 
TGCGCGCAAA 
ACAGCAGGCG 
CACATCGGAA 
TTTTGAGCAA 
AGCGGGATTT 
GATGCCCCCC 
GCGGGAAAAC 
GCCTGAAGCN 
GTATTCAATA 
CAGCCGCAAA 
TGAACCCGTA 
CAGAAATATC 
TTTTGAAACC 
CCTTCCGGCT 
TATTTCCACA 
GTTTTTCGAC 
GGTTTAAGGG 
CCGGCCGGCA 
AAAAGACATT 
TCGGCGGCNA 
GCATTCGGAG 
CGGAATTTAT 
ACCATTCGGT 
GCCAGCTATT 
TTCCCAAATC 
CAAACACTTG 
CAAGATGATA 
CNACTACATC 
TTCCGAGCTG 
AATTTCAAAG 
CGATTATNGG 
CGCAACCGAC 
AAAGAAGACC 
GCCGCGAGAT 
AACTGACTTT 
ACGGCTGAAG 
TTTCCGGCAA 
GCCAGCCTTT 
CTTATTTTCC 
TCCGCTCGAT 
TCGACCCGAA 
TTATGCAACG 
TGCACGCGGA 



GCCGTATCGA 
CATCGTACGC 
GCNTTGTGTC 
ACNATGGTNG 
GGGCAGGGCA 
TTATNGCCGG 
ATCAACAGCC 
TGTCGTTCAG 
GCACCAATTC 
TGGCTGGAAA 
CAGCGTGGCG 
ATTTTGGCGC 
GAAGGCGGGT 
CCAAAAGTCG 
AAGAACTGCA 
CTGGCGCCGC 
GCAGTCGGCA 
AATACACGGC 
ATCATCAACC 
TACCAACCTC 
CGAAAGGGTC 
TACAACAACG 
GCCCCGTGAA 
ACGAATACGG 
GGTCCGGATC 
CGATAAAGGG 
GCCAATATTT 
TACCGCTTAA 
ATATACGGGC 
AAAACTCGCC 
GAACCCGTAT 
CAGCATTAGT 
CGCGCACACA 
GGCGACTCCG 
GCAATTTGGC 
TATTAGGATT 
CACAACGTTT 
GGTCAGCAGC 
ACAAAGTGCA 
CGTTTTTTCA 
CAACTTCAGC 
AACTCAAACA 
TACGGACGTT 
GGGCGGCGCG 
AACGCTATAT 
CTGGGCAAGC 
GATTTTTGAT 
GCGCCGAAGT 
GCGGGCAATG 
AGACAAGGAC 
GCAAATACGG 
CNCACCTTTT 



CCCGTCAGGA 
ANCATCCCCG 
TTTGAATATT 
ACGGCATCAC 
GGCGGTTCAT 
ACTGGATGTC 
TTGCCGGTTC 
GGCAATANTA 
AACCAAAGGT 
GCGGAGCATC 
CAAAATTACC 
GGAATATCTG 
TGAAATTCAA 
TACTGGAAAA 
AAAATACATC 
AATACGACAT 
GGCAACCTGT 
GCAATTTCGC 
GCAATTATCA 
AATCTGACCG 
GAAGTTTACA 
CAAAAATCCT 
ACCGAGTTGC 
CAAAAACCGC 
ANGACAACGG 
CTGCTGCCCC 
CAACACGTTC 
ACTACAGCAC 
TATTACNGCT 
GACATACANG 
TGAAAAAATA 
GCGGACTTCG 
CCGTATGCCC 
GCGTTCACAC 
TTCAATACCT 
AAAACTGGTC 
ACGGGAAATG 
ACCGGGCTTG 
CAAACACGGT 
CCAACCTTTC 
GATGCGAGCG 
AGGTTATGGG 
TGGAAGTCGG 
ATGCGCTATT 
CGACGNCACC 
GTTCCATCAN 
TTNTACGCCG 
CAAAAATCTG 
ATGCGGCAAC 
GAAGAAGTAA 
CGGCACAAGC 
TGATAACGAT 
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This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



KDKKVFTDAR 
RXDSGFGRVN 
VKGSFSGSAG 
NAMAAIGARK 
ERRKQRYFEQ 
EGHDKSWREN 
DLNTKIGSRK 
GWGLXKDFET 
FPEELGLFFD 
YFDAALKKDI 
KHCNQSCGIY 
NIQEMYFSQI 
GYRSRIDXYI 
FELELNYDYX 
LSRVSALPRD 
NGXXTSNFRQ 
FDRRYIDPLD 
KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
IINRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 
HNVYGKWWDL 
RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDWQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 
F* 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDEFKR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSN FXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



20 ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 
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orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 

orf 133a. pep 
orfl33-l 



10 20 30 40 

KDKKVFTDARAVSTRQDIFKSXENLDNIVRXIPGAFTXQXKS 

I I I II I I I I I I I I ! I I I I I I I MINIM MINI I ) I 
EAQIQVLEDVHVKAKRVPKDKKVFTDARAVSTRQDIFKSSENLDNIVRSIPGAFTQQDKS 

10 20 30 40 50 



60 



50 60 70 80 90 100 

SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 

II mill MIMHIMMMI M M 1 1 M 1 1 1 M I M M I I I M M I MMIM 
SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 
70 80 90 100 110 120 

110 120 130 140 150 160 

GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

I | | | | | I I i I II I M I M i I I MMMM I M I I I I M I I I M M M M I M I I I II I 
GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

130 140 150 160 170 180 

170 180 190 200 210 220 

ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 

i | ! I I I I I M II M I I M t I II M M It I I I! M I I I I M M M M IIMIIMMMI 
ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 
190 200 210 220 230 240 

230 240 250 260 270 280 

WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 

||||:|:: || I |::|: MMMM MMMIII MMIMMIMM Mill 
WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 

250 260 270 280 290 

290 300 310 320 330 340 

LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

IIIIMIIMIIMIIIIIMMMMMMItMl MIIMMMMMMMI 

LFKLEYDGVFNKYTAQFRDLNTKIGSRKI INRNYQFNYGLSLN PYTNLNLTAAYNSGRQK 
300 310 320 330 340 350 

350 360 370 380 390 400 

YPKGSKFTGWGUCKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 

IIIHtMIIM MMIMMIMM II M M M II II M I II I M M M M M M I I 
YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLFRETELQTTLGFNYFHKEYGKNRFP 
360 370 380 390 400 410 

410 420 430 440 450 460 

EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
| | | 1 I I I I I I I II I II I II M I M I I M I M I I I I M I I M II I M M I M M M I I I I 
EELGLFFOGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
420 430 440 450 460 470 
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25 



30 



35 



40 



4^0 480 490 500 *10 520 

orf 133a . pep I^YSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGIYEPVUQC^GKKFA 

orf 133-1 LNYSTinVGYRFGGEYTGYYGSDDEFKIUFGENSPTYKKHCTRSCGIYEPVLKKYGKKRA 
480 490 500 510 520 530 

530 540 550 560 570 5B0 

orf 133a . pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAUCPERANTWQFGFN 

orf 133-1 NNHSVSISADFGDYHWPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

orf 133a . pep TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 
P | | | | I | | l | | | | I | || II I I I I II I I I I II I I I I I I I I : M I M I I I I I I I 1 I I I I M 

orf 133-1 TYKKGLLKQDiyTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 

650 660 670 680 690 700 

orf 133a . pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
1 | I | I ! I I I 1 1 | i 1 t I I I J t I I i 1 I I i I I t t I I I 1 I i I 1 1 I 1 I I t 1 I 1 I I 1 1 t • I K I I I 
orf 133-1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
660 670 680 690 700 710 

710 720 730 740 750 760 

orf 133a pep RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 

1 I I 1 I I I I I I I I I I I I I t I I I I I I I I I I I I I I M I 1 I I II I I ! I MINIM 

orf 133-1 RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSKFRQLG 
720 730 740 750 760 770 

770 780 790 800 810 820 

orf 133a pep KRS IXQTETLARQPLI FDXYAAYEPKKXLI FRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
Mil I I I I I I I I I 1 I I I lllltlll M 1 M M II U M I M M I I M M I N I I M I 
orf 133-1 KRSIKQTETLARQPLIFDFYAAYEPKKNLI FRAEVKNLFDRRYIDPLDAGNDAATQRYYS 

780 790 800 810 820 830 

830 840 850 860 870 

orf 133a . pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
I I) M I I I I : M I I I : M 1 M I M I ! I M I II I I I M IIMIMMM 
orf 133-1 SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
840 850 860 870 880 

Homology with a predicted ORF from N gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from N. 



45 gonorrhoeae: 

orf 133. pep 



50 



55 



60 



65 



orfl33ng 
orf 133. pep 
orf 133ng 
orf 133. pep 
orfl33ng 
orf 133. pep 
orfl33ng 
orf 133. pep 
orf 133ng 
orf 133. pep 
orf 133ng 



PG YYGS DDE FKRAFGEN S PTXKKHCNRSCG I 
MMI::II M M I I III: INI: IN: 
FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
M II II I N N N I I M I I I I N M I II M M : I N I N I M I I I I I N I 1 1 I N I M M 
YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 

M 1 M M I M I 1 I MM IIMIMMINIIIIIIMMMIMINNN: 

KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 



31 



560 



91 



620 



151 



680 



STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 
I | || M | | M I I MM: M M N I I N I I II I I M M I M I 11 II I M M 

STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 740 

S KE DQLKQG YG LS RV S AL PRD Y G RLE VGT RWLGN KLT LGGAMR Y FGKS I RAT AEER Y I DG 271 
I | | | I I I I I t I I 1 I I I I 1 1 I I I I 1 i I I I 1 I I I 1 i I I I I 1 1 I t 1 I 1 I I I I t I I 1 I f I I 1 I I 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS1RATAEERYIDG 800 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLI FRAEVKNLFDRRYI DPL 3 3 1 

1 N I I N II I M I M I M N I M M I M 1 I I M I I M II M II N II II I M 

TNGGNTSNVRQLGKRS I KQTETIJ^QPL I FDFYMYEPKKNL I FRAEVKNLFDRRYI DPL 860 
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orf 133. pep 
orf 133ng 
orfl33.pep 
orf 133ng 



DAGN DAAXERYY S S FDPKDKDXDVTCN ADKT LCNGKYGGTSK5 VLTN FARGRT FLMTMS Y 
lllltl l::llf MM Mill limilNIIII||||||||IIMIMIIIIMIIII 
DAGN DAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS VLTN FARGRT FLMTMS Y 



391 
920 



KF 393 

I i 

KF 922 



10 



15 



20 



25 



Uiiijjuy - — 

The complete length ORF133ng nucleotide sequence <SEQ ID 881> is predicted to encode a 
protein having amino acid sequence <SEQ ED 882>: 



1 MRSSFRLKPI 

51 PKDKKVFTDA 

101 IRGDSGFGRV 

151 VVKGSFSGSA 

201 GNAMAAIGAR 

251 LERRKQQYFV 

301 IEEHDKSWRE 

351 RDLNTRIGSR 

401 TGWGLLKDFE 

451 RFPEELGLFF 

501 FYFDAALKKD 

551 KEHCDPSCGL 

601 PNIQEMYFSQ 

651 VGYRSRIDNY 

701 GFELELNYDY 

751 GLSRVSALPR 

801 TNGGKTSKVR 

851 LFDRRYIDPL 

901 SKS VLTN FAR 



CFYLMGVMLY 

RAVSTRQDVF 

NTMVDGITQT 

GINSLAGSAN 

KWLESGASVG 

QEGGLKFNAG 

NLAPQYDITP 

KIINRNYQFN 

TYNNAKILDL 

DGPDQDNGLY 

IYRLNYSTNA 

YEPVLKKYGK 

IGDSGVHTAL 

IHNVYGKWWD 

GRFFTNLSYA 

DYGRLEVGTR 

QLGKRSIKQT 

DAGNDAATQR 

GRT FLMTMS Y 



HHSYAEDAGR 
KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLLNLEYD 
LNLTAAYKSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 



ETLARQPLI F 
YYSSFDPKDK 
KF* 



DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGENS PAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



30 A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
90i 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



ATGAGATCTT 
TATGCTATAT 
AGGCGCAGAT 
CCGAAAGACA 
gGATGTGTTC 
CCGGTGCGTT 
ATTCGCGGCG 
CACGCAGACC 
CATCTCAATT 
GTCGTCAAAG 
TTCGGCGAAT 
ATACCTACGG 
GGTAATGCGA 
GTCTGTCGGT 
ACCGCGTGGG 
CTGGAACGGC 
CAATGCCGGC 
AAACAAAGTG 
ATCGAAGAGC 
CATCACCCCC 
TGTTTAAATT 
CGCGATTTAA 
TCAATTCAAT 
CCGCAGCCTA 
ACAGGCTGGG 
CCTCGACCTC 
TGCAAACCAC 
CGCTTTCCTG 
CGGGCTTTAT 
CTCAAAAATC 
TTCTACTTCG 
CACCAATGCA 
GCTCGGAAAA 
AAGGAACATT 
ATACGGCAAA 
TCGGCGATTA 



CTTTCCGGTT 
CATCATAGTT 
ACAGGTTTTG 
AAAAAGTGTT 
AAATCCGGCG 
TACACAGCAA 
ACAGCGGGTT 
TTTTATTCGA 
CGGTGCATCT 
GCAGCTTCAG 
CTGCGGACTT 
CCTGCTGCTA 
TGGCGGCGAT 
GTGCTTTACG 
CGGCGGCGGG 
GCAAACAGCA 
AGCGGAAAAT 
GTATAAAAAA 
ATGATAAAAG 
ATCGATCCGT 
GGAATACGAC 
ACACCAGAAT 
TACGGTTTGT 
CAATTCGGGC 
GGCTTTTAAA 
AACAACACCG 
TTTGGGCTTC 
AAGAATTGGG 
TCCTATTTGG 
AACCATTGTC 
ATGCCGCGCT 
ATCAACTACC 
CGAATTTAAG 
GCGACCCGAG 
AAGCGCGCCA 
TTTCATGCCG 



GAAGCCGATT 
ATGCCGAAGA 
GAAGATGTGC 
TACCGATGCG 
AAAACCTCGA 
GATAAAAGCT 
CGGGCGGGTC 
CTTCTACCGA 
GTCGACAGCA 
CGGCTCGGCA 
TAGGCGTGGA 
AAAGGTCTGA 
AGGTGCGCGC 
GGCACAGCAG 
CAGCACATCG 
ATATTTTGTA 
GGGAACGGGA 
TACGAAGACC 
CTGGCGGGAA 
CCGGCCTGAA 
GGCGTATTCA 
CGGCAGCCGC 
CTTTGAACCC 
AGGCAGAAAT 
AGATTTTGAA 
CCACCTTCCG 
AATTATTTCC 
GCTGTTTTTC 
GGCGGTTTAA 
CAACCGGCCG 
CAAAAAAGAC 
GTTTCGGCGG 
CGGGCATTCG 
CTGCGGGCTT 
ACAACCATTC 
TTCGCCGGCT 



TGTTTTTATC 
TGCAGGGCGC 
ACGTCAAGGC 
CGTGCCGTAT 
CAACATCGTA 
CGGGCATTGT 
AATACGATGG 
TGCGGGCAGG 
ATTTTATTGC 
GGCATCAACA 
TGACGTCGTT 
CCGGCACCAA 
AAATGGCTGG 
GCGCGGCGTG 
GAAATTTTGG 
CAAGAGGGTG 
TTTGCAAAGG 
CCCAAGAACT 
AACCTGGCGC 
GCAGCAGTCG 
ATAAATACAC 
AAAATCATCA 
GTATACCAAC 
ATCCGAAAGG 
ACCTACAACA 
GCTGCCCCGC 
ACAACGAATA 
GACGGTCCTG 
GGGCGATAAA 
GCAGCCAATA 
ATTTACCGCT 
CGAATATACG 
GAGAAAACTC 
TATGAACCCG 
GGTCAGCATT 
ATTCGCGCAC 



TTATGGGTGT 
GCGGGCAGCG 
GAAGCGCGTA 
CGACCCGTca 
CGCAGCATAC 
GTCTTTGAAT 
TGGACGGCAT 
GCAGGCGGTT 
CGGACTGGAT 
GCCTTGCCGG 
CAGGGCAATA 
TTCAACCAAA 
AAAGCGGAGC 
GCGCAAAATT 
TGAAGAATAT 
GTTTGAAATT 
CAATACTGGA 
GCAAAAATAC 
CGCAATACGA 
GCAGGCAATC 
GGCGCAATTT 
ACCGCAATTA 
CTCAATCTGA 
GGCGAAGTTT 
ACGCGAAAAT 
GAAACCGAGT 
CGGCAAAAAC 
ATCAGGACAA 
GGGCTGTTGC 
TTTCAACACG 
TAAACTACAG 
GGCTATTACG 
GCCGGCATAC 
TATTGAAAAA 
AGTGCGGACT 
ACACCGTATG 
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1801 CCCAACATCC AAGAAATGTA TTTTTCCCAA ATCGGCGACT CCGGCGTTCA 

1851 CACCGCCTTA AAACCAGAGC GCGCAAACAC TTGGCAATTT GGCTTCAATA 

1901 CCTATAAAAA AGGATTGTTA AAACAAGATG ATATATTAGG ATTGAAACTG 

1951 GTCGGCTACC GCAGCCGCAT TGACAACTAC ATCCACAACG TTTACGGGAA 

2001 ATGGTGGGAT TTGAACGGGG ATATTCCGAG CTGGGTCGGC AGCACCGGGC 

2051 TTGCCTACAC CATCCGACAC CGCAATTTCA AAGACAAAGT GCACAAACAC 

2101 GGTTTTGAGC TGGAGCTGAA TTACGATTAT GGGCGTTTTT TCACCAACCT 

2151 TTCTTACGCC TATCAAAAAA GCACGCAACC GACCAATTTC AGCGATGCGA 

2201 GCGAATCGCC CAACAATGCC tCCaaAGAAG ACCAACTCAA ACAAGGTTAT 

2251 GGGCTGAGCA GGGTTTCCGC CCTGCCGCGA GATTACGGAC GTTTGGAAGT 

2301 CGGTACGCGC TGGTTGGGCA ACAAACTGAC TTTGGGCGGC GCGAtgcGCT 

2351 ATTTCGGCAA GAGCATCCGC GCGACGGCTG AAGAACGCTA TATCGACGGC 

2401 ACCAACGGGG GAAATACCAG CAATGTCCGG CAACTGGGCA AGCGTTCCAT 

2451 CAAACAAACC GAAACCCTTG CCCGACAGCC TTTGATTTTT GATTTTTACG 

2501 CCGCTTACGA GCCGAAGAAA AACCTTATTT TCCGCGCCGA AGTCAAAAAC 

2551 CTGTTCGACA GGCGTTATAT CGATCCGCTC GATGCGGGCA ATGATGCGGC 

2601 AACGCAGCGT TATTACAGCT CGTTCGACCC GAAAGACAAG GACGAAGACG 

— 2651 TAACGTGTAA TGCTGATAAA ACGTTGTGCA ACGGCAAATA CGGCGGCACA 

2701 AGCAAAAGCG TATTGACCAA TTTCGCACGC GGACGCACCT TCTTGATGAC 

2751 GATGAGCTAC AAGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYA EDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLFKLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNKLTLGG AMRYFGKSIR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 

10 20 30 40 50 60 

orf 133ng-l .pep SFRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

I 1 t I I I I I 1 1 1 1 I 1 1 I I I I 1 I I t I 1 I I 1 I I 
orf 133-1 EAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

10 20 30 



70 80 90 100 110 120 

orf 133ng-l .pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITOTFYS 
| | | | | : 1 1| : I I li I 1 11 1 I I I I I I I I I I I I 1 I M I I I I I M I I 1 1 I I 1 1 I I I J I M I I I 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 133ng-l .pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

nun 1 1 ii i tiiuiM unit i iimiii 1 1 1 1 e 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 

orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 
100 110 120 130 140 150 

190 200 210 220 230 240 

orf 133ng-l .pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 
| 1 1 M I I M M I I 1 1 I 1 1 I I I 1 1 I I I I I I I I I M I I I I I I I I I 1 I : I It I 1 1 I 1 1 1 1 I I I 
orf 133-1 NTYGLLLKGLTGTNSTKGKAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 
160 170 180 190 200 210 

250 260 270 280 290 300 

orf 133ng-l .pep GNFGEEYI^RRKQQYFVQEGGUCFNAGSGKWERDLQRQxWTKWYKKYEDPQELQKYIEE 
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ll||||:llll»|:IMI: ||||||MlM II I 11:1:: MM « « < 

orf 133-1 GNrGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 
220 230 240 250 260 

310 320 330 340 350 360 

orfl33ng-l.pep hdkswreniapqyditpidpsglkqqsagnlfkleydgvfnkytaqfrdlntrigsrkii 
t i I I I I 1 I I |||MlHI!l:MlllliMlilMMmillllllI!ll:MIIIII 
orf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 

370 380 390 400 410 420 

orr 33ng-l.pep NRNYQFNYGLSLNPYTOLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 

I 1 I 1 I I 1 I I I I 1 i I I 1 t I I I I I E 1 I I I I I I I I I 1 = i I I I t I 1 I I I | i I I I i t I I i 

orf 33-1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 
330 340 350 360 370 380 

430 440 450 460 470 480 

orf 133ng-l . pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRrKGDKGLL 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ( 1 1 1 1 1 1 1 

orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
390 400 410 420 430 440 

490 500 510 520 530 540 

orfl33ng-l.pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
t M I I I I I 1 1 I I I I M I I I I I I I H I H I I M M I I : : : I I I 1 1 I M U I I I : : M I I I 1 
orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

550 560 570 580 590 600 

orfl33ng-l.pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPN1 
|||||:||:||: I } I : I I I I I I I ! 1 I I I I I t I I It I I II I I I I I I 1 1 I : I I t I t I M I I 
orf 133-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSR7HRMPNI 
510 520 530 540 550 560 

610 620 630 640 650 660 

orf 133ng-l . pep QEMYFSCIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
I | I I i I I! i I I I I I I I I I I I i I ill I I ! M I I I I I I I I I I I MlllinilMMIIII 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
570 580 590 600 610 620 

670 680 690 700 710 720 

orfl33no-l.pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
| | | | i I ! 1 I 1 I 1 I II I : II I I I I II: I I I I I I I I I I I II I II I I I I I I I I I I II I I I I I I 
orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
630 640 650 660 670 680 

730 740 750 760 770 7B0 

orfl33ng-l.pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
M I I I I I I II I I II I I I I I M I I I M i I I I I I I I I I I M I M I I I I I I I I I I I I I II I H 
orf 133-1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
690 700 710 720 730 740 

790 800 810 820 830 840 

orfl33nq-l.pep YFGKSIRATAEERYIDGTNGGNTSKVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
liMIIIIIIMIIIMIIMIill IMIIIIIIIIMIIIIIIMIIIilllilMII 
orf 133-1 YFGKSIRATAEERYIDGTNGGNTSNFRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
750 760 770 780 790 800 

850 860 870 880 890 900 

orf!33ng-l.pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
I I I 1 1 1 1 1 ] I I I I I I I I I I f : 1 1 1 1 1 1 I I I I I ! I I I 1 I I I 1 1 1 I f 1 1 t 1 1 1 1 I I 1 I 1 1 1 1 
orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
810 820 830 840 850 860 



910 920 
orfl33ng-l.pep V LTN FARGRT FLMTM S YKFX 
I I I I I I I I I I I 1 I I I 1 1 I I I 
orf 133-1 VLTN FARGRT FLMTM S YKFX 

870 880 



In addition, ORF133ng-l is homologous to a TonB-dependent receptor in Kinfluenzae: 
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8 p,P4SUmC17 HAEIN PROBABLE TONS- DE PENDER RECEPTOR HI ^PRECURSOR _ 
Sein l precursor (tbpl) [Haemophilus influenzae] Length - 913 

5 zs&f*. 8W?«a«rssStii - <«•«. g - s - 12/921 ,w 

Query: 38 QVLEDVH^VPKD^^ » 
10 Sbjct: 29 ETLGQ I DVVEKV I SNDiSpFTEAKAKSTRENVFKETQT I DQV IRS I PGAFTQQDKGSGW 88 

^ y - S+NIRG++G GRVNTMVDG+TQTFYST+ D + G ++ GGSSQFGA*+D NFIAG+DV K +FS 

Sbjct: 89 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAI DPNFIAGVDVNKSNFS 148 

15 Query 158 GSAGINSLAGSANLRTIXBVDDWQXXXXXX^ 217 

sbjct: i 4 9 SSSSS— ^ 208 

20 Query: 218 SVGVLYGHSRRWAQNYRVGG^ 277 

l,L 20 9 jsasiysssss^^ 2 - 

Query: 278 LQRQYWK TKWY S^XST™ ^ 

Sbjct: 266 LSKKIWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

Qu ery: 304 HOKSWR «-g™«— 

Sbjct: 326 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDKKIGSRKIE 384 
Ouerv 364 NRNYQFWYGLSI^PYTNLKLTAAYNSGRQKYPKGAKFTGWGLLKDFETY^AKILDLNNT 423 

S„C4: .«5 ^FL«Kli;ISSn!lSsKEEL3L™DA S HD 0 < J LYS» S K I «GRYSGTKS *>. 
40 482 LLPQKSTIVQPAGSQYnWrYFDAAUtKDIYRWYSTNftlHYRFG^YTGYYGSEHEEKB 541 

HV yu " J LLPO+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 

Sbjct- 505 LLp5 R iviLQPSGKQKFKTVYFDTALSKGIYHLNYSVNrrHYAF«GEYVGY 555 

Query. 542 -GE.SPAY f HCOPSCGLYEPV^ «~ 

Sbjct: 556 — ENTAGQQ iNEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 

Query: 602 NIQEMYFSQIGDSGVHTALKPERANTWQreF^ 661 
50 Tbjct: 605 SSSSS^^^^^^ - 

Query. 662 n—™^ ™ 
Sbjct: 665 HNVYGVWW — rdq^E^AESNGFK^IAHQHYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

55 Query: 722 QKSTQPTNFSDASESPNNASKEDQLKQ^YGLSRVSALPRDYCT "1 

SbjcL 723 ^SS^PrS 782 
60 Query: 782 ---XRATAEER™ «« 

Sbjct: 783 jS^^^^-^^^^^^ 1 ^ 11 ^^^ 
Query 842 L I FRAEVKNLFDRRY I OPLDAGNDAATQRYY SS FDPKDKDEDVTOTADKTLCNGKYGGT S 901 
A.K LI +AEV+NL D+RY+ DPLDAGN DAA+QRYY SS + * <- „ Z. Zl* OQ , 
65 Sbjct: 842 LIIK^EVQNLLDKRYVDPLDAGNDAASQRYYSSL NNSIECAQDSSAC 892 

Query 902 KSVLTNFARGRTFLMTMSYKF 922 
K+VL N FARGRT+* ♦ +•■*■+ YKF 
70 Sbjct: 893 KTVLYNFARGRTYILSLNYKF 913 
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The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

5 Example 104 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAMCCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

10 151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

15 401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT.. 

This corresponds to the amino acid sequence <SEQ ID 886; ORF1 12>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGN1G KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

20 101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDK. . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

25 101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

30 351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

35 601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

40 851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 

951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG . . . 

This corresponds to the amino acid sequence <SEQ ID 888; ORF1 12-1>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

45 51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFI F~AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T PQTTRHGNMG 

50 301 LKLFGGICXG LLFHL AGRLF GFTSQL. . . 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 



BNSDOClD: <WO 99e*57BA2_L> 



WO 99/24578 



-484- 



PCT/IB98/01665 



Homnlnfr y with a predicted ORF from N. meni ng itidis (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

5 orf 1^2 pep MNLI SBY I IRQMAVMAVYALLAFLALYS FFEI LYETGNLGKG S YG I WEMLG YTALKMPAB 

orti^.pep ■ m ,,,i||||||| I IIIIIHI IIHIII' I I Mil I II 

orfll2a ^1^1^™^^ 

10 20 30 40 50 

in 70 80 90 100 HO 120 

orni2.pep AYELIPIAVLIGGLVSLSQI^GSELTVIKASGM^ 

erf 112a AYELMP1AVLIGGLVSXSQIAAGSE 

*"' 70 80 9° 100 110 

15 130 140 150 160 

orf 112 . pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 

orf!12a VAPTLSQ^NI^INGKISTGNTGI^ 
20 * 130 140 150 160 l'O i«u 

orfil2a Eij^VEADSAVl^SDGSWQlJCNIRRSTI^EDK^VSIAAEEXWPISV^LMDVLLVKP 
190 200 210 220 230 <:4U 

The ORF1 12a nucleotide sequence <SEQ ID 889> is: 

9 c 1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

^ 51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGNTG 

151 GGNTACACCG CCCTCAAAAT GNCCGCCCGC GCCTACGAAC TGATGCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCTNT CAGCCAGCTT GCCGCCGGCA 

■xn 251 GCGAACTGAN CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

3U 301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCGGCCAT CAACGGCAAA ATCAGTACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCATTAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

35 50 1 CCTGCTGGGC ATTAAAATCT GGGCCCGCAA CGATAAAAAC GAACTGGCAG 

" 55I AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAANT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

40 751 TACATCCGCC ACCTCCAAAN NNACAGCCAA AACACCCGAA TCTACGCCAT 

W 801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

951 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAANTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 
951 NCGGCTCTTC NGGTTTACCA GCCAACTCTA CGGCATCCCG CCCTTCCTCG 
45 1001 KCGGCGCACT ACCTACCATA GCCTTCGCCT TGCTCGCCGT TTGGCTGATA 
1051 CGCAAACAGG AAAAACGCTA A 

This encodes a protein having the amino acid sequence <SEQ ID 890>: 

1 MNLISRYIIR nMAV MAVYAL LAFLALYSFF EILYETGHLG KGSYGIWEMX 

51 ^vraT.KMYftR AVE LMPLAVL I GGLVSXSQ L AAGSELXVIK ASGMSTKK^L 

50 10 1 LILSQFGFIF A IATV ALGEW VAPTLSQKAE NIKAAAINGK I STGNTGLWL 

JU 151 KEKNSIINVR EMLPD HTLLG IKIWARNDKN ELAEAVEADS AVmSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAFT PQTTRHGNMG 

301 LKXFGGICLG LLFHL AGRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

55 351 RKQEKR* 

ORF1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 

orfU2a pep MNLISRYIIRQMAVMAWALLAFLALYSFFEILYETGNLGKGSYGIWFJIXGYTALKMXAR 
orfll2a.pep 77TT7YI I 1 1 IM 1 1 1 1 1 1 1 II I I 1 1 1 1 1 1 1 1 1 1 • 1 1 1 I 1 * * < 1,11111 11 

orfU2-l MNLISRYIIRCS^VMAVYALIAFLALYSFFEILYETGNLGKGSYGIWEMLGY^ 

60 orf 112a. pep AYEU4PLAVLIGGLVSXSQIAMSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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10 



15 



20 



ill I: I t I I | | || | | | 1 1 I I 1 I I ! I ^ I 1 I t i I I I I I I 1 I I I 1 I I I S « I 1 t I t 1 I I t I 1 I 
or f 11 2-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLXLSQFGFIFAIATVALGEW 

orf 112a oep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
* P I 1 I I t I 1 | | | | | 1 l t t I I I I 1 1 I ■ I I I I 1 I I i t I I I I I i I t I I I I I I I I I I I I I I I I 

orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

orf!12a pep ELAEAVEADS AVLNS DGSWQLKN I RRSTLGEDKVEVS IAAEEXWP I SVKRNLMDVLLVKP 
llltlilllllltlMlliMlitliiMilillillliill ItMMIIIilliJIM 
orf 112-1 E1AEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

orf 112a oep DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

i mm mi nt tin MiiiiiimMiiiiiMimMmmiiiiiiiMi 

orf 112-1 DQMSVGELTTYIRHLQNNSONTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

orf 112a . pep LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKOEKRX 

II I I I I I II I I I I I I I I I tint 
O r f 1 1 2 - 1 LKLFGGI CXGLLFHLAGRLFGFT SQL 

Homology with a predicted ORF from N gonorrhoeae 

ORF1 12 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF1 12ng) from N. 



25 



30 



gonorrhoeae: 

orf 112 .pep 
orf 112ng 
orf 112 .pep 
orfll2ng 
orf 112. pep 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
i | | | I I I I N I I I I i i I I I I I I I I I HI M I I i I I I I M I 1 I M M i I I I I I I ! M M I I 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
| I I I : I I I I I I I I I : I I I 1 I I I I i 1 1 : 11 1 I I M I I I I I I I I I I > I H I I I I I : M I I I I 
AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 



60 
60 



120 



120 
166 



VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 

||l||lllllllllllllillMlllllllltl:l:!IM Mill 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 180 



The 



35 



40 



45 



50 



55 



orfll2ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMijPD] 

complete length ORF1 12ng nucleotide sequence <SEQ ID 891> is 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGCTACACCG 
CGCCGTCCTC 
GCGAACTGGC 
TTGATTCTGT 
CGGCGAATGG 
cCGCCGCCAt 
AAAGAAAAAa 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
cgCCGCCGCC 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTTGCCTT 
TTAAAACTCT 
CAGGCTCTTC 
CCGGCGCACT 
CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CTCAGTTCGG 
GTTGCGCCCA 
taacggCAAA 
CCAGCATTAT 
ATCAAAATTT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAACTT 
CGTCAAGCCC 
ACCTCCAAAA 
CGTAAACTCG 
CGCCTTTACG 
TCGGCGGCAT 
GGGTTTACCA 
GCCTACCATA 
AAAAACGTTG 



CATCATCCGC 
TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGCCTCTCT 
GCCAGCGGCA 
TTTTATTTTT 
CGCTGAGCCA 
ATCAGCAccg 
CAATGTGcGc 
GGGCGCGCAA 
GCCGTTTTGA 
CATCATGGGT 
gGCCGATTGC 
GACCAAATGT 
CAACAGCCAA 
TTTACCCCGT 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 
A 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCG 
AAAAGCCGAA 
gcAATACCGG 
GGAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
ACAGACAAAA 
CGTCAGACGC 
CCGTCGGCGA 
AACACCCAAA 
CGCCGCATGG 
CGCGCCACGG 
TTGCTGTTCC 
CGGCACCCCA 
TGCTCGCTGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TCATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAag 
CCTTTggcTG 
CCGACCATAC 
GAATTGGCAG 
CAGCTGGCAG 
TCGAAACATC 
AACCTGATGG 
GCTGACCACC 
TCTACGCCAT 
GTCATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



60 



This encodes a protein having amino acid sequence <SEQ ED 892>: 

l MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYELMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LI LSQFGFI F~AI AAV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW R KLVYPVAAW VMALVAFAF T PQTTRHGNKG 

301 LKLFGGICLG LLFHLAGRLF GFTSQLYGTP PF LAGALPTI AFALLAVWLI 
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351 RKQEKR* 

ORF1 12ng and ORF1 12-1 show 94.2% identity in 326 aa overlap: 

10 20 30 40 50 60 

orfll2na tWLISRYIIRQMAVMAVYAIJ^ 

I M M I I I I I I I M I I 1 1 I I I M I I I I I I i I M I M I I M I I II I I M I M Ml I Ml I I 
orf 112-1 MNLI SRY 1 1 RQMAVMAVYALLAFLAL YS FFE I L YETGN LGKGS YG I WEMLG YTALKMPAR 

10 20 30 40 50 -60 

70 80 90 100 110 120 

orfll2nq AYELMPIJVVLIGGIASLSQIJ^GSEIAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 

I I I 1 r I | I I 1 I I I I r 1 1 II I I I I I I I r I I I 1 I I I 1 I I I I I I 1 1 I I 1 I I * ' ' JL* * 

orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfll2nq VAPTLSQKAENIKAAAINGKI STGNTGLWLKEKTS I INVRGMLPDHTLLGIKIWARNDKN 

I I I I I 1 1 I 1 I 1 1 1 1 I I I I I 1 1 1 I I I I I 1 1 t I I 1 * I MM llllillMllllllllll 
orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfll2nq ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
Ml llllllll 1111111 Ml MM II :l MMM I I I I : I I I : I : I M I I I I I I I I 
orf 112-1 ELAEAVEADSAVl^SDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNliMDVLLVKP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfll2ng dqmsvgelttyirhlc^nsqktqiyaiavwrklvypvaawvmalvafaftpqttrhgnmg 

I || MM II I (III I II I I II 1:1 I HI Ml M 1 ||:|| I I llllllll 11111111111 

orf 112-1 dqmsvgelttyirhlqnnsqntriyaiawwrklvypaaawvmalvafaftpqttrhgnmg 

250 260 270 280 290 300 

310 320 330 340 350 

orfll2ng lklfggiclgllfhlagrlfgftsqlygtppflagalptiafallavwlirkqekrx 

I M I I M I II I M II I II I I M I M 
orfll2-l LKLFGGICXGLLFHLAGRLFGFTSQL 

310 320 



This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
40 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I -PCR primers 



ORF 


Primer 


Sequence 


Restriction sites 


ORF1 


Forward ( 
Reverse ( 


:gcggatccgctagc-ggacacacttatttcgg ] 
cccgctcgag - ccagcggt agcc t aat t ] 


r% TTT XTVuaT 1 

3amHI-Nnel 
Xhol 


ORF 2 


: orward 
Reverse 


3CGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG- GACGGCATAACGGCG 


r* TTT \TJaT 1 

BamHI-Ndel 
Xhol 


ORF 2-1 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-TGATTTACGGACGCGCA 


BamHI-Ndel 
Xhol 


ORF 4 


Forward 
Reverse 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC 
CCCGCTCGAG-TTTGGCTGCGCCTTC 


BamHI-Ndel j 
Xhol 


ORF 5 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC 

CGGGATCC-ATGGAAGGCGCACAAC 

CCCGCTCGAG-GACTGTGCAAAAACGG 


Ndel-Ncol 

f> TTT 1 

BamHi 
Xhol 


ORF 6 


Forward 
Reverse 


CGCGGATCCCATATG-ACCCGTCAATCTCTGCA 
CCCGCTCGAG-TGCGCCGAACACTTTC 


T> r. .-.-.TTT XT^£»T 1 

. aamrll-Nael j 
Xhol | 


ORF 7 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC 
CCCGCTCGAG-TTTCAAAATATATTTGCGGA 


BamHI-Nnei 
Xhol 


ORF 8 


Forward 
Reverse 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC 
CCCGCTCGAG-AGCAGGCTTTGGCGC 


Bamril-Ndel 
Xhol | 


ORF 9 


Forward 
Reverse 


CGCGGATCCCATATG-CCGAAGGAAGTCGGAAA 
CCCGCTCGAG-TTTCCGAGGTTTTCGGG 


Bamill-Nael 
Xhol j 


ORF 10 


Forward 
Reverse 


GCGGATCCCATATG-GACACAAAAGAAATCCTC 
CCCGCTCGAG- TAATGGGAAACCTTGTTTT 


T\ TJT VTJ n l 

BamHI-Ndel 
Xhol 


ORF 11 


Forward 
Reverse 


GCGG ATCCCAT ATG - GCGGT C AACC TCTACG 
CCCGCTCGAG - GG AAACGACTTCGCC 


BamHI-Ndel 
Xhol 


ORF 13 


Forward 
Reverse 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC 
CCCGCTCGAG-AGGGTGTGTGATAATAAG 


THk TTT \TJ»T 1 

BamHI-Ndel 
Xhol 


ORF 15 


Forwarc 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG 
CGGGATCC-TGCGGGACACTGACAGG 
CCCGCTCGAG -AGGTTGGCCTTGTCTATG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 17 


Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG 


Ndel-Ncol 
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: orward 1 
Reverse c 


;GGGATCC-ATTGCCGGCCTGT!Lb 

;CCGCTCGAG-AAGCAGGTTGTACAGC ] 


3amHI 
Khol 


ORF18 


Forward ( 
Reverse ' 


3CGGATCCCATATG-ATTTTGCTGCATTTGGAT 
XCGCTCGAG-TCTTCCAATTTCTGAAAGC ] 


BamHI-Ndel 
Xhol 


ORF 19 


•orward * 
; orward ' 
Reverse 


3GAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC 

CGGGATCC-TTCGCCAGTGTTTTTACCG 

CCCGCTCGAG-GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF20 


"orward 
: orward 
leverse 


GG AAT T CC AT ATGGCCATG G -TCGGCGCGGGTATG 

CGGGATCC-TTCGGCGCGGGTATG 

CCCGCTCGAG-CGGCGAGCGAGAGCA 


Ndel-Ncol 

amHI 
Xhol 


ORF22 


Forward 
: orward 
Reverse 


GG AATTCCATATGGCCATGG-T GAT TAAAATCAAAAAAGGTCT 

CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC 

CCCGCTCGAG-ATTATGATAGCGGCCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCCGCTCGAG-TTTAAACCGATAGGTAAACG 


BamHI-Ndel 
Xhol 


ORF24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG 

CGGGATCC-ATGATGCCGGAAATGGTG 

CCCGCTCGAG-TGTCAGCGTGGCGCA 


Ndel-Ncol 

BamHI 

Xhol 


ORF25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGAG-ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG -CAGCTGATCGACTATTC 
CCCGCTCGAG-GACATCGGCGCGTTTT 


BamHI-Ndel 
Xhol 


ORF 27 


Forwarc 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTTTA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG-GGGTTCGATTAAATAACCAT 


Ndel-Ncol 

BamHI 

Xhol 


ORF 28 


Forwarc 
Forwarc 
Reverse 


GGAATTCCATATGGCCATGG -ACGGCTGTACGTTGATGT 

CGGGATCC-AACGGCTGTACGTTGATG 

CCCGCTCGAG-TTTGTCAGAGGAATTCGCG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 29 


Forwarc 
Forward 
ivevcrbc 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forwarc 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT 

CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT 

CCCGCTCGAG-AAACAGCCATTTGAGCGA 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT 
CCCGCTCGAG-ATAGCCCGCTTTCAGG 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT 
CCCGCTCGAG-AGCATTGTCCAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCGGCAGACA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG 
CCCGCTCGAG-TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG-GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG-TTCATCTTTTTCATGTTCG 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GATCAGCTAGCCATATG-AAACAGAAAAAAACCGC 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG -GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG-GCAGAGATCTGTTTG 
CCCGCTCGAG-GTTTGCCGATCCGACCA 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA 

CGGGATCC-GCCATACCTTCTTATCAGAG 

CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC 


Ndel-Ncol 
BamHI j 
Xhol 


ORF 97 


Forward 


GCGGATCCCATATG-CATCCTGCCAGCGAAC 


BamHI-Ndel 
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Reverse ( 


XCGCTCGAG-TTCGCCTACGGTTTTTTG J 


Xhol 


ORF98 


Forward ( 
Reverse i 


3CGGATCCCATATG-ACGGTAACTGCGG 

XCGCTCGAG-TTGTTGTTCGGGCAAATC ] 


BamHI-Ndel 
Xhol 


ORF 100 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


BamHI-Ndel 
Xhol " 


ORF101 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


BamHI-Ndel 
Xhol 


ORF 102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


BamHI-Ndel 
Xhol 


ORF 103 


Forward 
Reverse 


GCGGATCCCATATG-AACCACGACATCAC 
CCCGCTCGAG-CAGCCACAGGACGGC 


BamHI-Ndel 
Xhol 


ORF 104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


BamHI-Ndel 
Xhol 


ORF 105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


BamHI-Ndel 
Xhol 


ORF 106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


BamHI-Ndel 
Xhol 


ORF 109 


Forward 


GCGGATCCCATATG-GAAGATTTATATATAATACTCG 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


BamHI-Ndel 
Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
l\ft&rTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 

L Oil 


ORF111 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA 
aaarTGCAG-TCTGCGCGTTTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 

& a Zv n a hGTP GAP- CT ATTTTTT AGGGGC TTTTGC ITGTTTG AAAAGCCT GCC 


EcoRI 
Sail 


ORF119 


Forward 
Reverse 


AAAGAATTC-TACAACATGTATCAGGAAAACCAATACCG 
21 a APTGr AG-TTATGAAAAC AGGCGCAGGGCGGTTTTGCC 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


; orward 1 
Reverse 


AAAAAAGTCGAC-ATGTC2TACCGCGCAAGCAGTTCTCC 
AAACTGCAG-TCAGGAACACAAACGATGACGAATATCCGTATC 


Sail 
PstI 


ORF125 


: orward 

\CVCI5w 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
PstI 


ORF126 


: onvard 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


EcoRI 
PstI 


ORF127 


7 orward 
Reverse 


AAAGAATTC-ATGACTGATAATCGGGGGTTTACG 
AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
Sail 


ORF128 


: orward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC 
AAACTGCAG-CTArrGCAATGCGCCGCCGCGGGAATGITTGAGCAGGCG 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 
AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCAT TCTCGGTGCG 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


EcoRI 
PstI 


ORF 131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT 
CCCGCTCGAG-CCAGCGGACGCGTTC 


BamHI-Ndel 
Xhol 


ORF132 


Forward 
Reverse 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG 
CCCGCTCGAG-CCAATCTGCCAGCCGT 


BamHI-Ndel 
Xhol 


ORF 133 


Forward 
Reverse 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG 
CCCGCTCGAG-AAACTTGTAGCTCATCGT 


BamHI-Ndel 
Xhol 


ORF 134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG 
CCCGCTCGAG-ATCCTGTGCCAATGCG 


BamHI-Ndel 
Xhol 


ORF 135 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAAAAGCTTT 
CCCGCTCGAG-AAATACCGCTGAGGATG 


BamHI-Ndel 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC 
CCCGCTCG AG - TTCCGAAT AT T TGGAACTT TT 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA 
CCCGCTCGAG-ATAACGGTATGCCGCC 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 
CCCGCTCGAG-CGGCGTTTTATAGCGG 


BamHI-Ndel | 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 
CCCGCTCGAG-TAACGTTTCCGTGCGTTT 


BamHI-Ndel 
Xhol 
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ORF140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC 
CCCGCTCGAG-GACGATGGCAAACAGC 


BamHI-Ndel 
Xhol 


ORFI41 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT 


BamHI-Ndel 
Xhol 


ORF142 


Forward 
Reverse 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG 
CCCGCTCGAG-AAACGTATAGCCTACCT 


BamHI-Ndel 
Xhol 


ORF143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT 
CCCGCTCGAG - AATGGCTTCCGCAAT ATG 


BamHI-Ndel 
Xhol 


ORF144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 
CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


BamHI-Ndel 
Xhol 


ORF147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 
CCCGCTCGAG- TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 1 10-130, where the ORF itself carries an EcoRl site (eg. ORF122), a &/I site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
5 ORFs 1 1 5 and 127), a Sail site was used in the reverse primer. 
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TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 


GST-fusion 


Purification 






expression 


expression 


- 


orf 1 


+ 


+ 


+ 


His-fusion 


orf2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d 




GST-fusion 


orf 4 




+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 




GST-fusion 


i orf 6 


+ 


+ 


+ 


GST-fusion i 


orf 7 




+ 


+ ! 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 




orf 9 


+ 


+ 


+ 


GST-fusion 


orf 10 


+ 


n.d. 


n.d. 




orf 11 


+ 


n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fusion 


orf 15 


+ 


+ 


+ 


GST-fusion 


orf 17 


+ 


n.d. 


n.d. 




orf 18 


+ 


n.d. 


n.d. 




orf 19 




n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fusion 


orf 23 


+ 


+ 


+ 


His-fusion 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


His-fusion 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


+ 


+ 


+ 


GST-fusion 


orf 28 


+ 


+ 


+ 


GST-fusion 


orf 29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf 33 


+ 


n.d. 


n.d. 




orf 35 




n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 


+ 


n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 




n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


orf 79 


+ 


+ 


n.<L 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.<L 


orf 84 




n.& 


n.d. 
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orf85 


+ 


n.d. 


+ 


yjo i -rusion 


orf89 


+ 


n.d. i 


+ 


Cjo i -iusion 


orf97 


+ 


+ 


+ 


vjo i -iusion 


orf98 


+ 


n.d. 


TLd, 




orflOO 


+ 


n.d. 


n.d. 




orflOl 


+ 


n.d. 


n.d. 




orfl02 


+ 


n.d. 


n.d. 




orfl03 


+ 


n.d 


n.d. 




orfl04 


+ 


n.d. 


ad. 




orfl05 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orfl09 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 


. : 


orflll 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 


. : 


orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-iusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


Go i -tusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-iusion 


orf 138 


+ 


n.d. 


| + 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n A 


nd 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.A 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 
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CLAIMS 

1 - A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 
10 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 
15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 
20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 
25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

30 5. A protein having 50% or greater sequence identity to a protein according to claim 4. 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 
5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 
10 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 
15 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 
20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89,91,93,95,97,99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179. 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261 , 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 31 1, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403,405, 407, 409, 
411, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 

5 451, 453,455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 
611, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 

10 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 
811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 

15 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
& 891.. 

1 0. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 
41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 
20 93,95,97,99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199,201,203,205,207,209,211,213, 
215', 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 

25 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 

30 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 

35 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 81 1, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 1 0. 

1 2. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
10 one of claims 8- 1 2 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 14 or claim 1 5 for use as a pharmaceutical. 

17. The use of a composition according to claim 1 4 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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this International application, as follows: 

Invention l.Clalms: ((1-3) completely) and ((4-17) partially) 

A protein comprising an amino acid sequence selected from the group 
consisting of SEQ ID NOS: 2,4,6 and 8 or fragments thereof; a protein 
having 50X or greater sequence Identity to said protein; an antibody 
binding said protein; a nucleic add encoding said protein; a nucleic 
acid comprising a sequence selected from the group consisting of 
SEQ ID NOS: 1,3,5 and 7 or fragments thereof; a composition comprising 
said protein, said nucleic add or said antibody; the use of said 
composition; 

Invention 2 to 104. Claims (4-17) partially 

Idem as subject 1 but limited to the ORFs corresponding to examples 
2-104 characterized by SEQ ID NOS: 9-892. 

(Invention 2 is limited to SEQ ID NOS: 9-10; Invention 3 1s limited 
to SEQ ID NOS: 11-18; Invention 4 is limited to SEQ ID NOS: 19-28; 
; Invention 104 1s limited to SEQ ID NOS: 885-892). 

In view of additional search fees paid, Inventions 5, 26 , 55, 77 and 91 have 
been further searched. 
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